{"title":"Transducer Minimization and Information Compression for NooJ Dictionaries","authors":"Slim Mesfar, M. Silberztein","doi":"10.3233/978-1-58603-975-2-110","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-110","url":null,"abstract":"In this paper, we describe the use of an incremental construction method of minimal, acyclic, deterministic FST. The approach consists in constructing a transducer in a single step by adding new strings one by one and minimizing the resultant automaton incrementally. Then, we present a new method to encode the morphological information associated with the dictionary entries. The new encoding unifies a large number of word forms' analyses, thus reducing the number of terminal states of the dictionary's FST, that triggers a more efficient minimization process. Finally, we present experimental results on the FST that represents the Arabic dictionary.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"25 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125672917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Domenico Cantone, S. Cristofaro, S. Faro, Emanuele Giaquinta
{"title":"Finite State Models for the Generation of Large Corpora of Natural Language Texts","authors":"Domenico Cantone, S. Cristofaro, S. Faro, Emanuele Giaquinta","doi":"10.3233/978-1-58603-975-2-175","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-175","url":null,"abstract":"Natural languages are probably one of the most common type of input for text processing algorithms. Therefore, it is often desirable to have a large training/testing set of input of this kind, especially when dealing with algorithms tuned for natural language texts. In many cases the problem due to the lack of big corpus of natural language texts can be solved by simply concatenating a set of collected texts, even with heterogeneous contexts and by different authors. \u0000 \u0000In this note we present a preliminary study on a finite state model for text generation which maintains statistical and structural characteristics of natural language texts, i.e., Zipf's law and inverse-rank power law, thus providing a very good approximation for testing purposes.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132895230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regular Expressions and Predicate Logic in Finite-State Language Processing","authors":"Mans Hulden","doi":"10.3233/978-1-58603-975-2-82","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-82","url":null,"abstract":"This paper proposes an extension to the formalism of regular expressions with a form of predicate logic where quantified propositions apply to substrings. The implementation hinges crucially on the manipulation of auxiliary symbols which has been a common, though previously unsystematized practice in finite-state language processing. We also apply the notation to give alternate compilation methods for two-level grammars and various types of replacement rules found in the literature, and show that, under a certain interpretation, two-level rules and many types of replacement rules are equivalent.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126843421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representing and Combining Calendar Information by Using Finite-State Transducers","authors":"J. Niemi, K. Koskenniemi","doi":"10.3233/978-1-58603-975-2-122","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-122","url":null,"abstract":"This paper elaborates a model for representing various types of semantic calendar expressions (SCEs), which correspond to the disambiguated intensional meanings of natural-language calendar phrases. The model uses finite-state transducers (FSTs) to mark denoted periods of time on a set of timelines also represented as an FST. In addition to an overview of the model, the paper presents methods to combine the periods marked on two timeline FSTs into a single timeline FST and to adjust the granularity and span of time of a timeline FST. The paper also discusses advantages and limitations of the model.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133163039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Making Finite-State Methods Applicable to Languages Beyond Context-Freeness via Multi-dimensional Trees","authors":"Anna Kasprzik","doi":"10.3233/978-1-58603-975-2-98","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-98","url":null,"abstract":"We provide a new term-like representation for multi-dimensional trees as defined by Rogers [1,2] which establishes them as a direct generalization of classical trees. As a consequence these structures can be used as input for finite-state applications based on classical term-based tree language theory. Via the correspondence between string and tree languages these applications can then be conceived to be able to process even some language classes beyond context-freeness.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131155281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forest FIRE and FIRE Wood: Tools for Tree Automata and Tree Algorithms","authors":"L. Cleophas","doi":"10.3233/978-1-58603-975-2-191","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-191","url":null,"abstract":"Pattern matching, acceptance, and parsing algorithms on node-labeled, ordered, ranked trees ('tree algorithms') are important for applications such as instruction selection and tree transformation/term rewriting. Many such algorithms have been developed. They often are based on results from such algorithms on words or generalizations thereof using finite (tree) automata. Regrettably no coherent, extensive toolkit of such algorithms and automata existed, complicating their use. \u0000 \u0000Our toolkit FOREST FIRE contains many such algorithms and automata constructions. It is accompanied by the graphical user interface (GUI) FIRE WOOD. The toolkit and GUI provide a useful environment for experimenting with and comparing the algorithms. In this tool paper we give an overview of the toolkit and GUI, their context and design rationale, and mention some results obtained with them.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116945193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CLARIN and Free Open Source Finite-State Tools","authors":"K. Koskenniemi, Anssi Yli-Jyrä","doi":"10.3233/978-1-58603-975-2-3","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-3","url":null,"abstract":"A new emerging European research infrastructure called CLARIN and a related project called HFST are briefly described. HFST has built a programming interface on top of some existing open source finite-state packages such as SFST and OpenFST. In order to verify its utility, HFST has built open source tools on top of this HFST interface. These tools create lexical transducers, compile morphophonological two-level rules and combine them into a transducer lexicon. The tools have been tested against independently created with full-scale lexicons and rules for Northern Sami and Lule Sami languages which have more complicated lexical and morphophonological structure than most other European languages.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114428930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graeme W. Blackwood, A. Gispert, J. Brunning, W. Byrne
{"title":"Large-Scale Statistical Machine Translation with Weighted Finite State Transducers","authors":"Graeme W. Blackwood, A. Gispert, J. Brunning, W. Byrne","doi":"10.3233/978-1-58603-975-2-39","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-39","url":null,"abstract":"The Cambridge University Engineering Department phrase-based statistical machine translation system follows a generative model of translation and is implemented by the composition of component models of translation and movement realised as Weighted Finite State Transducers. Our flexible architecture requires no special purpose decoder and readily handles the large-scale natural language processing demands of state-of-the-art machine translation systems. In this paper we describe the CUED system's participation in the NIST 2008 Arabic-English machine translation evaluation task.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114541836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event Extraction for Italian Using a Cascade of Finite-State Grammars","authors":"Vanni Zavarella, Hristo Tanev, J. Piskorski","doi":"10.3233/978-1-58603-975-2-158","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-158","url":null,"abstract":"This paper reports on our experience of adapting a real-world live event extraction system based on a cascade of finite-state extraction grammars to the processing of a new language, namely Italian. The real-time event extraction processing chain and the pattern specification language are briefly presented. The major part of the paper focuses on the creation of event extraction grammars and related resources for English and their adaptation for extracting events in Italian news articles. Some interesting phenomena which complicate the event extraction task for Italian are pinpointed and the results of the evaluation are presented. In particular, we compared two versions of the system for Italian, one based on surface-level patterns and a hybrid one, which integrates slightly more linguistically sophisticated patterns for covering a rich variety of morphological and syntactic constructions in Italian.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127016767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Simple Formalism for Capturing Reduplication in Finite-State Morphology","authors":"Mans Hulden, Shannon T. Bischoff","doi":"10.3233/978-1-58603-975-2-207","DOIUrl":"https://doi.org/10.3233/978-1-58603-975-2-207","url":null,"abstract":"This paper presents a simple formalism for capturing reduplication phenomena in the morphology and phonology of natural languages. After a brief survey of the facts common in reduplicative elements cross-linguistically, these facts are described in terms of finite-state systems. The principal idea is that an operator can be derived to ensure equivalence of finite discontinuous strings at some level of representation.","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129388882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}