Old town of Dresden with TU Dresden logo

The 14th International Conference on Finite-State Methods and Natural Language Processing
(FSMNLP, September 23–25, 2019, Dresden, Germany)

Grammatical Framework: an Interlingual Grammar Formalism

Speaker: Aarne Ranta (University of Gothenburg, Sweden)

Grammatical Framework (GF) was born at Xerox Research Centre Europe in 1998. Its purpose was to provide a declarative grammar formalism for interlingual translation systems. The core of GF is Constructive Type Theory (CTT), also known as Logical Framework, which is used for building interlingual representations. On top of these representations, GF provides a functional programming language for defining reversible mappings from interlinguas to concrete languages, equivalent to Parallel Multiple Context-Free Grammars (PMCFG).

Open-source since 1999, GF has a world-wide community that has built comprehensive grammars for over 40 languages. GF is also used in several companies to build applications for translation, natural language generation, semantic analysis, chatbots, and dialogue systems. The focus has been on Controlled Natural Languages (CNL), but recent research has also combined GF with statistical and machine learning techniques, such as neural dependency parsing. In this way, GF can scale up to robust and wide-coverage language processing, without sacrificing explainability.

The tutorial is meant for an audience that has some experience with formal language theory and its use in practical implementations. However, it is self-contained and does not assume specific knowledge such as CTT or PMCFG. The structure is the following:

  1. Hands-on introduction (45 min). Interactive coding in the GF Cloud to get an idea of how GF works.
  2. Theoretical background (45 min). GF as a formalism and programming language, with references to its main inspirations (constructive type theory, Montague grammar, categorial grammars, XFST)
  3. The GF Ecosystem (30 min). Software tools, on-going academic research, commercial applications, and open-source community activities.

GF homepage: http://www.grammaticalframework.org/
Speaker’s homepage: http://www.cse.chalmers.se/~aarne/

A Survey of Recent Advances in Efficient Parsing for Graph Grammars

Speaker: Frank Drewes (Umeå University, Sweden)

Context-free graph grammars, in particular hyperedge replacement graph grammars, look back on over 30 years of history. They share many of the good properties of context-free string languages. Unfortunately, the complexity of parsing is the big exception: early results in the field showed that even for fixed grammars, the membership problem can be NP-complete. Moreover, the known results about polynomial parsing that were obtained afterwards, while constituting nice theoretical work, seemed to be of limited practical value. This is because they were either based on very ”impractical” restrictions, or the degree of the polynomial running time depended on the grammar and could thus become large.

In the current decade, the question received renewed interest because hyperedge replacement is one of the candidate formalisms for specifying semantic graphs in natural language processing. Using graph grammars in this area requires parsing algorithms that are not only polynomial in theory, but efficient in practice. Preferably, the degree of the polynomial bounding their running time should be a (small) constant independent of the grammar, or else it should depend on parameters not likely to be large. The talk will present an overview of results towards this goal, discussing their requirements, advantages, and disadvantages as well as a few possible directions for future work.

Speaker’s homepage: https://www.umu.se/en/staff/frank-drewes/

Latent Variable Grammars for Discontinuous Parsing

Speaker: Kilian Gebhardt (Technische Universität Dresden, Germany)

Latent variable context-free grammars are powerful models for predicting the syntactic structure of sentences (Matsuzaki et al. 2005, Petrov et al. 2006, Petrov and Klein 2007). When trained on annotated corpora, the resulting latent variables can be shown to capture different distributions for, e.g., NPs in subject and object position. Several languages (and in consequence also syntactic treebanks for these languages) such as Dutch (Lassy), German (NeGra, TiGer), but also English (Penn Treebank) contain structures that cannot be adequately modelled by context-free grammars. In consequence, a class of more power grammar formalisms called mildly context-sensitive has been studied (cf. Kallmeyer 2010). Although parsing with these models is polynomial in the length of the input sentence (Seki et al. 1991), it has for a long time been regarded prohibitively slow. However, in recent years it was shown that the application of mildly-context sensitive grammars is feasible in coarse-to-fine parsing approaches (van Cranenburgh 2013, Ruprecht and Denkinger 2019).

In this talk I consider how both the latent variable approach and mildly context-sensitive grammars can be joined and applied to discontinuous treebanks:

  1. A large class of latent variable grammars can be captured as a probabilistic RTG combined with an algebra. I show how the training methodology of latent variable PCFG can be generalized for this class.
  2. I recall two mildly context-sensitive models: Linear context-free rewriting systems (LCFRS, Vijay-Shanker et al. 1987) and hybrid grammars (Nederhof and Vogler 2014, Gebhardt et al. 2017). In particular, I consider the induction of hybrid grammars, which can be parametrized such that the polynomial complexity of parsing is of bounded degree. This way even hybrid grammars that are structurally equivalent to finite state automata can be obtained.
  3. I analyse different trends when training latent variable LCFRS and hybrid grammars on different discontinuous treebanks and applying them for parsing.

Speaker’s homepage: https://wwwtcs.inf.tu-dresden.de/~kilian/