{"title":"Finite-State Morphological Analysis for Marathi","authors":"Vinit Ravishankar, Francis M. Tyers","doi":"10.18653/v1/W17-4006","DOIUrl":null,"url":null,"abstract":"This paper describes the development of free/open-source morphological descriptions for Marathi, an Indo-Aryan language spoken in the state of Maharashtra in India. We describe the conversion and usage of an existing Latin-based lexicon for our Devanagari-based analyser, taking into account the distinction between full vowels and diacritics, that is not adequately captured by the Latin. Marathi displays elements of both fusional and agglutinative morphology, which gives us different ways to potentially treat the morphology; philosophically, we approach our analyser by treating the morphology system as a three-layer affixing system. We use the lttoolbox lexicon formalism for describing the finite-state transducer, and attempt to work within a morphological framework that would allow for some consistency across Indo-Aryan languages, enabling machine translation across language pairs. An evaluation of our finite-state transducer shows that the coverage is adequate, over 80% on two corpora, and the precision is good (over 97%).","PeriodicalId":286427,"journal":{"name":"Finite-State Methods and Natural Language Processing","volume":"214 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Finite-State Methods and Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W17-4006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper describes the development of free/open-source morphological descriptions for Marathi, an Indo-Aryan language spoken in the state of Maharashtra in India. We describe the conversion and usage of an existing Latin-based lexicon for our Devanagari-based analyser, taking into account the distinction between full vowels and diacritics, that is not adequately captured by the Latin. Marathi displays elements of both fusional and agglutinative morphology, which gives us different ways to potentially treat the morphology; philosophically, we approach our analyser by treating the morphology system as a three-layer affixing system. We use the lttoolbox lexicon formalism for describing the finite-state transducer, and attempt to work within a morphological framework that would allow for some consistency across Indo-Aryan languages, enabling machine translation across language pairs. An evaluation of our finite-state transducer shows that the coverage is adequate, over 80% on two corpora, and the precision is good (over 97%).