Finite-State Methods and Natural Language Processing最新文献

筛选
英文 中文
Grammatical Framework: an Interlingual Grammar Formalism 语法框架:语际语法形式主义
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3101
Aarne Ranta
{"title":"Grammatical Framework: an Interlingual Grammar Formalism","authors":"Aarne Ranta","doi":"10.18653/v1/W19-3101","DOIUrl":"https://doi.org/10.18653/v1/W19-3101","url":null,"abstract":"Grammatical Framework (GF) was born at Xerox Research Centre Europe in 1998. Its purpose was to provide a declarative grammar formalism for interlingual translation systems. The core of GF is Constructive Type Theory (CTT), also known as Logical Framework, which is used for building interlingual representations. On top of these representations, GF provides a functional programming language for defining reversible mappings from interlinguas to concrete languages, equivalent to Parallel Multiple Context-Free Grammars (PMCFG). Open-source since 1999, GF has a world-wide community that has built comprehensive grammars for over 40 languages. GF is also used in several companies to build applications for translation, natural language generation, semantic analysis, chatbots, and dialogue systems. The focus has been on Controlled Natural Languages (CNL), but recent research has also combined GF with statistical and machine learning techniques, such as neural dependency parsing. In this way, GF can scale up to robust and wide-coverage language processing, without sacrificing explainability. The tutorial is meant for an audience that has some experience with formal language theory and its use in practical implementations. However, it is self-contained and does not assume specific knowledge such as CTT or PMCFG. The structure is the following: 1. Hands-on introduction (45 min). Interactive coding in the GF Cloud to get an idea of how GF works. 2. Theoretical background (45 min). GF as a formalism and programming language, with references to its main inspirations (constructive type theory, Montague grammar, categorial grammars, XFST) 3. The GF Ecosystem (30 min). Software tools, on-going academic research, commercial applications, and open-source community activities.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131778710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Variable Grammars for Discontinuous Parsing 不连续解析的潜在变量语法
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3103
Kilian Gebhardt
{"title":"Latent Variable Grammars for Discontinuous Parsing","authors":"Kilian Gebhardt","doi":"10.18653/v1/W19-3103","DOIUrl":"https://doi.org/10.18653/v1/W19-3103","url":null,"abstract":"Latent variable context-free grammars are powerful models for predicting the syntactic structure of sentences (Matsuzaki, Miyao, and Tsujii 2005; Petrov, Barrett, et al. 2006; Petrov and Klein 2007). When trained on annotated corpora, the resulting latent variables can be shown to capture different distributions for, e.g., NPs in subject and object position. Several languages (and in consequence also syntactic treebanks for these languages) such as Dutch (Lassy van Noord 2009), German (NeGra, Skut et al. 1997; TiGer Brants et al. 2004), but also English (Penn Treebank, Marcus, Santorini, and Marcinkiewicz 1993, Evang and Kallmeyer 2011) contain structures that cannot be adequately modelled by context-free grammars. In consequence, a class of more power grammar formalisms called mildly context-sensitive has been studied (cf. Kallmeyer 2010). Although parsing with these models is polynomial in the length of the input sentence (Seki et al. 1991), it has for a long time been regarded prohibitively slow. However, in recent years it was shown that the application of mildly-context sensitive grammars is feasible in coarse-to-fine parsing approaches (van Cranenburgh 2012; Ruprecht and Denkinger 2019). In this talk I consider how both the latent variable approach and mildly context-sensitive grammars can be joined and applied to discontinuous treebanks: 1. A large class of latent variable grammars can be captured as a probabilistic regular tree grammar combined with an algebra. I show how the training methodology of latent variable PCFG can be generalized for this class. 2. I recall two mildly context-sensitive grammar formalisms: linear context-free rewriting systems (LCFRS, Vijay-Shanker, Weir, and Joshi 1987) and hybrid grammars (Nederhof and Vogler 2014; Gebhardt, Nederhof, and Vogler 2017). In particular, I consider the induction of hybrid grammars, which can be parametrized such that the polynomial complexity of parsing is of bounded degree. This way also hybrid grammars that are structurally equivalent to finite state automata can be obtained. 3. I analyse different trends when training latent variable LCFRS and hybrid grammars on different discontinuous treebanks and applying them for parsing.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131221197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSO with tests and reducts 带有测试和缩减的MSO
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3106
T. Fernando, David Woods, Carl Vogel
{"title":"MSO with tests and reducts","authors":"T. Fernando, David Woods, Carl Vogel","doi":"10.18653/v1/W19-3106","DOIUrl":"https://doi.org/10.18653/v1/W19-3106","url":null,"abstract":"Tests added to Kleene algebra (by Kozen and others) are considered within Monadic Second Order logic over strings, where they are likened to statives in natural language. Reducts are formed over tests and non-tests alike, specifying what is observable. Notions of temporal granularity are based on observable change, under the assumption that a finite set bounds what is observable (with the possibility of stretching such bounds by moving to a larger finite set). String projections at different granularities are conjoined by superpositions that provide another variant of concatenation for Booleans.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A FST Description of Noun and Verb Morphology of Azarbaijani Turkish 阿塞拜疆语名词和动词形态的FST描述
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4008
R. Ehsani, Berke Özenç, E. Solak
{"title":"A FST Description of Noun and Verb Morphology of Azarbaijani Turkish","authors":"R. Ehsani, Berke Özenç, E. Solak","doi":"10.18653/v1/W17-4008","DOIUrl":"https://doi.org/10.18653/v1/W17-4008","url":null,"abstract":"We give a FST description of nominal and finite verb morphology of Azarbaijani Turkish. We use a hybrid approach where nominal inflection is expressed as a slot-based paradigm and major parts of verb inflection are expressed as optional paths on the FST. We collapse adjective and noun categories in a single nominal category as they behave similarly as far as their paradigms are concerned. Thus, we defer a more precise identification of POS to further down the NLP pipeline.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124250576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Finite-State Morphological Analysis for Marathi 马拉地语的有限态形态学分析
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4006
Vinit Ravishankar, Francis M. Tyers
{"title":"Finite-State Morphological Analysis for Marathi","authors":"Vinit Ravishankar, Francis M. Tyers","doi":"10.18653/v1/W17-4006","DOIUrl":"https://doi.org/10.18653/v1/W17-4006","url":null,"abstract":"This paper describes the development of free/open-source morphological descriptions for Marathi, an Indo-Aryan language spoken in the state of Maharashtra in India. We describe the conversion and usage of an existing Latin-based lexicon for our Devanagari-based analyser, taking into account the distinction between full vowels and diacritics, that is not adequately captured by the Latin. Marathi displays elements of both fusional and agglutinative morphology, which gives us different ways to potentially treat the morphology; philosophically, we approach our analyser by treating the morphology system as a three-layer affixing system. We use the lttoolbox lexicon formalism for describing the finite-state transducer, and attempt to work within a morphological framework that would allow for some consistency across Indo-Aryan languages, enabling machine translation across language pairs. An evaluation of our finite-state transducer shows that the coverage is adequate, over 80% on two corpora, and the precision is good (over 97%).","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115976610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evaluating an Automata Approach to Query Containment 评估查询包含的自动机方法
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4010
Michael Jason Minock
{"title":"Evaluating an Automata Approach to Query Containment","authors":"Michael Jason Minock","doi":"10.18653/v1/W17-4010","DOIUrl":"https://doi.org/10.18653/v1/W17-4010","url":null,"abstract":"Given two queries Qsuper and Qsub, query containment is the problem of determining if Qsub(D) ⊆ Qsuper(D) for all databases D. This problem has long been explored, but to our knowledge no one has e ...","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132810124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-tape Computing with Synchronous Relations 同步关系的多磁带计算
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4005
C. Wurm, Simon Petitjean
{"title":"Multi-tape Computing with Synchronous Relations","authors":"C. Wurm, Simon Petitjean","doi":"10.18653/v1/W17-4005","DOIUrl":"https://doi.org/10.18653/v1/W17-4005","url":null,"abstract":"We sketch an approach to encode relations of arbitrary arity as simple languages. Our main focus will be faithfulness of the encoding: we prove that with normal finite-state methods, it is impossible to properly encode the full class of rational (i.e. transducer recognizable) relations; however, there is a simple encoding for the synchronous rational relations. We present this encoding and show how standard finite-state methods can be used with this encoding, that is, arbitrary operations on relations can be encoded as operations on the code. Finally we sketch an implementation using an existing library (FOMA).","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Failure Transducers and Applications in Knowledge-Based Text Processing 故障传感器及其在基于知识的文本处理中的应用
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4001
S. Mihov, K. Schulz
{"title":"Failure Transducers and Applications in Knowledge-Based Text Processing","authors":"S. Mihov, K. Schulz","doi":"10.18653/v1/W17-4001","DOIUrl":"https://doi.org/10.18653/v1/W17-4001","url":null,"abstract":"Finite-state devices encoding lexica and related knowledge bases often become very large. A well-known technique for reducing the size of finite-state automata is the use of failure transitions. Here we generalize the concept of failure transitions for finite-state automata to the case of subsequential transducers. Failure transitions in the new sense do not have input but may produce output. As an application field for failure transducers we consider text rewriting with large rewrite lexica under the leftmost-longest replacement strategy. It is shown that using failure transducers leads to a huge space reduction compared to the use of standard subsequential transducers. As a concrete example we show how all Wikipedia concepts in an input text can be linked in an online manner with the Wikipedia pages of the concepts using failure transducers.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127111721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Word Transduction for Addressing the OOV Problem in Machine Translation for Similar Resource-Scarce Languages 解决类似资源稀缺语言机器翻译中OOV问题的词转导
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4009
Anssi Yli-Jyrä
{"title":"Word Transduction for Addressing the OOV Problem in Machine Translation for Similar Resource-Scarce Languages","authors":"Anssi Yli-Jyrä","doi":"10.18653/v1/W17-4009","DOIUrl":"https://doi.org/10.18653/v1/W17-4009","url":null,"abstract":"Wiktionary provides lexical information for an increasing number of languages, including morphological inflection tables. It is a good resource for automatically learning rule-based analysis of the inflectional morphology of a language. This paper performs an extensive evaluation of a method to extract generalized paradigms from morphological inflection tables, which can be converted to weighted and unweighted finite transducers for morphological parsing and generation. The inflection tables of 55 languages from the English edition of Wiktionary are converted to such general paradigms, and the performance of the probabilistic parsers based on these paradigms are tested.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127264759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers 通过加权有限状态传感器的音译移动键盘输入
Finite-State Methods and Natural Language Processing Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4002
Lars Hellsten, Brian Roark, Prasoon Goyal, Cyril Allauzen, F. Beaufays, Tom Y. Ouyang, M. Riley, David Rybach
{"title":"Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers","authors":"Lars Hellsten, Brian Roark, Prasoon Goyal, Cyril Allauzen, F. Beaufays, Tom Y. Ouyang, M. Riley, David Rybach","doi":"10.18653/v1/W17-4002","DOIUrl":"https://doi.org/10.18653/v1/W17-4002","url":null,"abstract":"We present an extension to a mobile key-board input decoder based on finite-state transducers that provides general translit-eration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline translit-eration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124888317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信