Finite-State Methods and Natural Language Processing最新文献

筛选
英文 中文
Transition-Based Coding and Formal Language Theory for Ordered Digraphs 有序有向图的转换编码与形式语言理论
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-23 DOI: 10.18653/v1/W19-3115
Anssi Yli-Jyrä
{"title":"Transition-Based Coding and Formal Language Theory for Ordered Digraphs","authors":"Anssi Yli-Jyrä","doi":"10.18653/v1/W19-3115","DOIUrl":"https://doi.org/10.18653/v1/W19-3115","url":null,"abstract":"Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124397001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Regular transductions with MCFG input syntax 使用MCFG输入语法的常规转导
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-23 DOI: 10.18653/v1/W19-3109
M. Nederhof, H. Vogler
{"title":"Regular transductions with MCFG input syntax","authors":"M. Nederhof, H. Vogler","doi":"10.18653/v1/W19-3109","DOIUrl":"https://doi.org/10.18653/v1/W19-3109","url":null,"abstract":"We show that regular transductions for which the input part is generated by some multiple context-free grammar can be simulated by synchronous multiple context-free grammars. We prove that synchronous multiple context-free grammars are strictly more powerful than this combination of regular transductions and multiple context-free grammars.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115413931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Finite State Transducer Calculus for Whole Word Morphology 全词形态学的有限状态换能器演算
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3107
Maciej Janicki
{"title":"Finite State Transducer Calculus for Whole Word Morphology","authors":"Maciej Janicki","doi":"10.18653/v1/W19-3107","DOIUrl":"https://doi.org/10.18653/v1/W19-3107","url":null,"abstract":"The research on machine learning of morphology often involves formulating morphological descriptions directly on surface forms of words. As the established two-level morphology paradigm requires the knowledge of the underlying structure, it is not widely used in such settings. In this paper, we propose a formalism describing structural relationships between words based on theories of morphology that reject the notions of internal word structure and morpheme. The formalism covers a wide variety of morphological phenomena (including non-concatenative ones like stem vowel alternation) without the need of workarounds and extensions. Furthermore, we show that morphological rules formulated in such way can be easily translated to FSTs, which enables us to derive performant approaches to morphological analysis, generation and automatic rule discovery.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128107852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Meta-Morph Rules to develop Morphological Analysers: A case study concerning Tamil 使用元形态规则开发形态分析器:以泰米尔语为例
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3111
Kengatharaiyer Sarveswaran, G. Dias, Miriam Butt
{"title":"Using Meta-Morph Rules to develop Morphological Analysers: A case study concerning Tamil","authors":"Kengatharaiyer Sarveswaran, G. Dias, Miriam Butt","doi":"10.18653/v1/W19-3111","DOIUrl":"https://doi.org/10.18653/v1/W19-3111","url":null,"abstract":"This paper describes a new and larger coverage Finite-State Morphological Analyser (FSM) and Generator for the Dravidian language Tamil. The FSM has been developed in the context of computational grammar engineering, adhering to the standards of the ParGram effort. Tamil is a morphologically rich language and the interaction between linguistic analysis and formal implementation is complex, resulting in a challenging task. In order to allow the development of the FSM to focus more on the linguistic analysis and less on the formal details, we have developed a system of meta-morph(ology) rules along with a script which translates these rules into FSM processable representations. The introduction of meta-morph rules makes it possible for computationally naive linguists to interact with the system and to expand it in future work. We found that the meta-morph rules help to express linguistic generalisations and reduce the manual effort of writing lexical classes for morphological analysis. Our Tamil FSM currently handles mainly the inflectional morphology of 3,300 verb roots and their 260 forms. Further, it also has a lexicon of approximately 100,000 nouns along with a guesser to handle out-of-vocabulary items. Although the Tamil FSM was primarily developed to be part of a computational grammar, it can also be used as a web or stand-alone application for other NLP tasks, as per general ParGram practice.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129481776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Latin script keyboards for South Asian languages with finite-state normalization 具有有限状态规范化的南亚语言的拉丁字母键盘
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3114
Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, M. Riley
{"title":"Latin script keyboards for South Asian languages with finite-state normalization","authors":"Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, M. Riley","doi":"10.18653/v1/W19-3114","DOIUrl":"https://doi.org/10.18653/v1/W19-3114","url":null,"abstract":"The use of the Latin script for text entry of South Asian languages is common, even though there is no standard orthography for these languages in the script. We explore several compact finite-state architectures that permit variable spellings of words during mobile text entry. We find that approaches making use of transliteration transducers provide large accuracy improvements over baselines, but that simpler approaches involving a compact representation of many attested alternatives yields much of the accuracy gain. This is particularly important when operating under constraints on model size (e.g., on inexpensive mobile devices with limited storage and memory for keyboard models), and on speed of inference, since people typing on mobile keyboards expect no perceptual delay in keyboard responsiveness.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127310000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Distilling weighted finite automata from arbitrary probabilistic models 从任意概率模型中提取加权有限自动机
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3112
A. Suresh, Brian Roark, M. Riley, Vlad Schogol
{"title":"Distilling weighted finite automata from arbitrary probabilistic models","authors":"A. Suresh, Brian Roark, M. Riley, Vlad Schogol","doi":"10.18653/v1/W19-3112","DOIUrl":"https://doi.org/10.18653/v1/W19-3112","url":null,"abstract":"Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132911066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Bottom-Up Unranked Tree-to-Graph Transducers for Translation into Semantic Graphs 自底向上的无排序树到图转换器翻译成语义图
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3104
Johanna Björklund, Shay B. Cohen, F. Drewes, G. Satta
{"title":"Bottom-Up Unranked Tree-to-Graph Transducers for Translation into Semantic Graphs","authors":"Johanna Björklund, Shay B. Cohen, F. Drewes, G. Satta","doi":"10.18653/v1/W19-3104","DOIUrl":"https://doi.org/10.18653/v1/W19-3104","url":null,"abstract":"We propose a formal model for translating unranked syntactic trees, such as dependency trees, into semantic graphs. These tree-to-graph transducers can serve as a formal basis of transition systems for semantic parsing which recently have been shown to perform very well, yet hitherto lack formalization. Our model features “extended” rules and an arc-factored normal form, comes with an efficient translation algorithm, and can be equipped with weights in a straightforward manner.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129435938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Syntactically Expressive Morphological Analyzer for Turkish 土耳其语句法表达形态分析器
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3110
Adnan Ozturel, Tolga Kayadelen, Isin Demirsahin
{"title":"A Syntactically Expressive Morphological Analyzer for Turkish","authors":"Adnan Ozturel, Tolga Kayadelen, Isin Demirsahin","doi":"10.18653/v1/W19-3110","DOIUrl":"https://doi.org/10.18653/v1/W19-3110","url":null,"abstract":"We present a broad coverage model of Turkish morphology and an open-source morphological analyzer that implements it. The model captures intricacies of Turkish morphology-syntax interface, thus could be used as a baseline that guides language model development. It introduces a novel fine part-of-speech tagset, a fine-grained affix inventory and represents morphotactics without zero-derivations. The morphological analyzer is freely available. It consists of modular reusable components of human-annotated gold standard lexicons, implements Turkish morphotactics as finite-state transducers using OpenFst and morphophonemic processes as Thrax grammars.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124199209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On the Compression of Lexicon Transducers 关于词典换能器的压缩
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3105
Marco Cognetta, Cyril Allauzen, M. Riley
{"title":"On the Compression of Lexicon Transducers","authors":"Marco Cognetta, Cyril Allauzen, M. Riley","doi":"10.18653/v1/W19-3105","DOIUrl":"https://doi.org/10.18653/v1/W19-3105","url":null,"abstract":"In finite-state language processing pipelines, a lexicon is often a key component. It needs to be comprehensive to ensure accuracy, reducing out-of-vocabulary misses. However, in memory-constrained environments (e.g., mobile phones), the size of the component automata must be kept small. Indeed, a delicate balance between comprehensiveness, speed, and memory must be struck to conform to device requirements while providing a good user experience. In this paper, we describe a compression scheme for lexicons when represented as finite-state transducers. We efficiently encode the graph of the transducer while storing transition labels separately. The graph encoding scheme is based on the LOUDS (Level Order Unary Degree Sequence) tree representation, which has constant time tree traversal for queries while being information-theoretically optimal in space. We find that our encoding is near the theoretical lower bound for such graphs and substantially outperforms more traditional representations in space while remaining competitive in latency benchmarks.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122814318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted parsing for grammar-based language models 基于语法的语言模型的加权解析
Finite-State Methods and Natural Language Processing Pub Date : 2019-09-01 DOI: 10.18653/v1/W19-3108
Richard Mörbitz, H. Vogler
{"title":"Weighted parsing for grammar-based language models","authors":"Richard Mörbitz, H. Vogler","doi":"10.18653/v1/W19-3108","DOIUrl":"https://doi.org/10.18653/v1/W19-3108","url":null,"abstract":"We develop a general framework for weighted parsing which is built on top of grammar-based language models and employs flexible weight algebras. It generalizes previous work in that area (semiring parsing, weighted deductive parsing) and also covers applications outside the classical scope of parsing, e.g., algebraic dynamic programming. We show an algorithm which terminates and is correct for a large class of weighted grammar-based language models.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131757203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信