{"title":"Rule Based Part of Speech Tagging of Sindhi Language","authors":"J. Mahar, G. Q. Memon","doi":"10.1109/ICSAP.2010.27","DOIUrl":null,"url":null,"abstract":"Part of Speech (POS) tagging is a process of assigning correct syntactic categories to each word in the text. Tag set and word disambiguation rules are fundamental parts of any POS tagger. No work has hitherto been published of tag set in Sindhi language. The Sindhi lexicon for computational processing is also not available. In this study, the tag set for Sindhi POS, lexicon and word disambiguation rules are designed and developed. The Sindhi corpus is collected from a comprehensive Sindhi Dictionary. The corpus is based on the most recent available vocabulary used by local people. In this paper, preliminary achievements of rule based Sindhi Part of Speech (SPOS) tagger are presented. Tagging and tokenization algorithms are also designed for the implementation of SPOS. The outputs of SPOS are verified by Sindhi linguist. The development of SPOS tagger may have an important milestone towards computational Sindhi language processing.","PeriodicalId":303366,"journal":{"name":"2010 International Conference on Signal Acquisition and Processing","volume":"21 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Signal Acquisition and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAP.2010.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Part of Speech (POS) tagging is a process of assigning correct syntactic categories to each word in the text. Tag set and word disambiguation rules are fundamental parts of any POS tagger. No work has hitherto been published of tag set in Sindhi language. The Sindhi lexicon for computational processing is also not available. In this study, the tag set for Sindhi POS, lexicon and word disambiguation rules are designed and developed. The Sindhi corpus is collected from a comprehensive Sindhi Dictionary. The corpus is based on the most recent available vocabulary used by local people. In this paper, preliminary achievements of rule based Sindhi Part of Speech (SPOS) tagger are presented. Tagging and tokenization algorithms are also designed for the implementation of SPOS. The outputs of SPOS are verified by Sindhi linguist. The development of SPOS tagger may have an important milestone towards computational Sindhi language processing.