{"title":"Grammatical Framework: an Interlingual Grammar Formalism","authors":"Aarne Ranta","doi":"10.18653/v1/W19-3101","DOIUrl":"https://doi.org/10.18653/v1/W19-3101","url":null,"abstract":"Grammatical Framework (GF) was born at Xerox Research Centre Europe in 1998. Its purpose was to provide a declarative grammar formalism for interlingual translation systems. The core of GF is Constructive Type Theory (CTT), also known as Logical Framework, which is used for building interlingual representations. On top of these representations, GF provides a functional programming language for defining reversible mappings from interlinguas to concrete languages, equivalent to Parallel Multiple Context-Free Grammars (PMCFG). Open-source since 1999, GF has a world-wide community that has built comprehensive grammars for over 40 languages. GF is also used in several companies to build applications for translation, natural language generation, semantic analysis, chatbots, and dialogue systems. The focus has been on Controlled Natural Languages (CNL), but recent research has also combined GF with statistical and machine learning techniques, such as neural dependency parsing. In this way, GF can scale up to robust and wide-coverage language processing, without sacrificing explainability. The tutorial is meant for an audience that has some experience with formal language theory and its use in practical implementations. However, it is self-contained and does not assume specific knowledge such as CTT or PMCFG. The structure is the following: 1. Hands-on introduction (45 min). Interactive coding in the GF Cloud to get an idea of how GF works. 2. Theoretical background (45 min). GF as a formalism and programming language, with references to its main inspirations (constructive type theory, Montague grammar, categorial grammars, XFST) 3. The GF Ecosystem (30 min). Software tools, on-going academic research, commercial applications, and open-source community activities.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131778710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Variable Grammars for Discontinuous Parsing","authors":"Kilian Gebhardt","doi":"10.18653/v1/W19-3103","DOIUrl":"https://doi.org/10.18653/v1/W19-3103","url":null,"abstract":"Latent variable context-free grammars are powerful models for predicting the syntactic structure of sentences (Matsuzaki, Miyao, and Tsujii 2005; Petrov, Barrett, et al. 2006; Petrov and Klein 2007). When trained on annotated corpora, the resulting latent variables can be shown to capture different distributions for, e.g., NPs in subject and object position. Several languages (and in consequence also syntactic treebanks for these languages) such as Dutch (Lassy van Noord 2009), German (NeGra, Skut et al. 1997; TiGer Brants et al. 2004), but also English (Penn Treebank, Marcus, Santorini, and Marcinkiewicz 1993, Evang and Kallmeyer 2011) contain structures that cannot be adequately modelled by context-free grammars. In consequence, a class of more power grammar formalisms called mildly context-sensitive has been studied (cf. Kallmeyer 2010). Although parsing with these models is polynomial in the length of the input sentence (Seki et al. 1991), it has for a long time been regarded prohibitively slow. However, in recent years it was shown that the application of mildly-context sensitive grammars is feasible in coarse-to-fine parsing approaches (van Cranenburgh 2012; Ruprecht and Denkinger 2019). In this talk I consider how both the latent variable approach and mildly context-sensitive grammars can be joined and applied to discontinuous treebanks: 1. A large class of latent variable grammars can be captured as a probabilistic regular tree grammar combined with an algebra. I show how the training methodology of latent variable PCFG can be generalized for this class. 2. I recall two mildly context-sensitive grammar formalisms: linear context-free rewriting systems (LCFRS, Vijay-Shanker, Weir, and Joshi 1987) and hybrid grammars (Nederhof and Vogler 2014; Gebhardt, Nederhof, and Vogler 2017). In particular, I consider the induction of hybrid grammars, which can be parametrized such that the polynomial complexity of parsing is of bounded degree. This way also hybrid grammars that are structurally equivalent to finite state automata can be obtained. 3. I analyse different trends when training latent variable LCFRS and hybrid grammars on different discontinuous treebanks and applying them for parsing.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131221197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MSO with tests and reducts","authors":"T. Fernando, David Woods, Carl Vogel","doi":"10.18653/v1/W19-3106","DOIUrl":"https://doi.org/10.18653/v1/W19-3106","url":null,"abstract":"Tests added to Kleene algebra (by Kozen and others) are considered within Monadic Second Order logic over strings, where they are likened to statives in natural language. Reducts are formed over tests and non-tests alike, specifying what is observable. Notions of temporal granularity are based on observable change, under the assumption that a finite set bounds what is observable (with the possibility of stretching such bounds by moving to a larger finite set). String projections at different granularities are conjoined by superpositions that provide another variant of concatenation for Booleans.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A FST Description of Noun and Verb Morphology of Azarbaijani Turkish","authors":"R. Ehsani, Berke Özenç, E. Solak","doi":"10.18653/v1/W17-4008","DOIUrl":"https://doi.org/10.18653/v1/W17-4008","url":null,"abstract":"We give a FST description of nominal and finite verb morphology of Azarbaijani Turkish. We use a hybrid approach where nominal inflection is expressed as a slot-based paradigm and major parts of verb inflection are expressed as optional paths on the FST. We collapse adjective and noun categories in a single nominal category as they behave similarly as far as their paradigms are concerned. Thus, we defer a more precise identification of POS to further down the NLP pipeline.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124250576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite-State Morphological Analysis for Marathi","authors":"Vinit Ravishankar, Francis M. Tyers","doi":"10.18653/v1/W17-4006","DOIUrl":"https://doi.org/10.18653/v1/W17-4006","url":null,"abstract":"This paper describes the development of free/open-source morphological descriptions for Marathi, an Indo-Aryan language spoken in the state of Maharashtra in India. We describe the conversion and usage of an existing Latin-based lexicon for our Devanagari-based analyser, taking into account the distinction between full vowels and diacritics, that is not adequately captured by the Latin. Marathi displays elements of both fusional and agglutinative morphology, which gives us different ways to potentially treat the morphology; philosophically, we approach our analyser by treating the morphology system as a three-layer affixing system. We use the lttoolbox lexicon formalism for describing the finite-state transducer, and attempt to work within a morphological framework that would allow for some consistency across Indo-Aryan languages, enabling machine translation across language pairs. An evaluation of our finite-state transducer shows that the coverage is adequate, over 80% on two corpora, and the precision is good (over 97%).","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115976610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating an Automata Approach to Query Containment","authors":"Michael Jason Minock","doi":"10.18653/v1/W17-4010","DOIUrl":"https://doi.org/10.18653/v1/W17-4010","url":null,"abstract":"Given two queries Qsuper and Qsub, query containment is the problem of determining if Qsub(D) ⊆ Qsuper(D) for all databases D. This problem has long been explored, but to our knowledge no one has e ...","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132810124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-tape Computing with Synchronous Relations","authors":"C. Wurm, Simon Petitjean","doi":"10.18653/v1/W17-4005","DOIUrl":"https://doi.org/10.18653/v1/W17-4005","url":null,"abstract":"We sketch an approach to encode relations of arbitrary arity as simple languages. Our main focus will be faithfulness of the encoding: we prove that with normal finite-state methods, it is impossible to properly encode the full class of rational (i.e. transducer recognizable) relations; however, there is a simple encoding for the synchronous rational relations. We present this encoding and show how standard finite-state methods can be used with this encoding, that is, arbitrary operations on relations can be encoded as operations on the code. Finally we sketch an implementation using an existing library (FOMA).","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Failure Transducers and Applications in Knowledge-Based Text Processing","authors":"S. Mihov, K. Schulz","doi":"10.18653/v1/W17-4001","DOIUrl":"https://doi.org/10.18653/v1/W17-4001","url":null,"abstract":"Finite-state devices encoding lexica and related knowledge bases often become very large. A well-known technique for reducing the size of finite-state automata is the use of failure transitions. Here we generalize the concept of failure transitions for finite-state automata to the case of subsequential transducers. Failure transitions in the new sense do not have input but may produce output. As an application field for failure transducers we consider text rewriting with large rewrite lexica under the leftmost-longest replacement strategy. It is shown that using failure transducers leads to a huge space reduction compared to the use of standard subsequential transducers. As a concrete example we show how all Wikipedia concepts in an input text can be linked in an online manner with the Wikipedia pages of the concepts using failure transducers.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127111721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Transduction for Addressing the OOV Problem in Machine Translation for Similar Resource-Scarce Languages","authors":"Anssi Yli-Jyrä","doi":"10.18653/v1/W17-4009","DOIUrl":"https://doi.org/10.18653/v1/W17-4009","url":null,"abstract":"Wiktionary provides lexical information for an increasing number of languages, including morphological inflection tables. It is a good resource for automatically learning rule-based analysis of the inflectional morphology of a language. This paper performs an extensive evaluation of a method to extract generalized paradigms from morphological inflection tables, which can be converted to weighted and unweighted finite transducers for morphological parsing and generation. The inflection tables of 55 languages from the English edition of Wiktionary are converted to such general paradigms, and the performance of the probabilistic parsers based on these paradigms are tested.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127264759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Hellsten, Brian Roark, Prasoon Goyal, Cyril Allauzen, F. Beaufays, Tom Y. Ouyang, M. Riley, David Rybach
{"title":"Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers","authors":"Lars Hellsten, Brian Roark, Prasoon Goyal, Cyril Allauzen, F. Beaufays, Tom Y. Ouyang, M. Riley, David Rybach","doi":"10.18653/v1/W17-4002","DOIUrl":"https://doi.org/10.18653/v1/W17-4002","url":null,"abstract":"We present an extension to a mobile key-board input decoder based on finite-state transducers that provides general translit-eration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline translit-eration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124888317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}