V. Krishnapriya, P. Sreesha, T. R. Harithalakshmi, T. Archana, Jayasree N. Vettath
{"title":"Design of a POS tagger using conditional random fields for Malayalam","authors":"V. Krishnapriya, P. Sreesha, T. R. Harithalakshmi, T. Archana, Jayasree N. Vettath","doi":"10.1109/COMPSC.2014.7032680","DOIUrl":null,"url":null,"abstract":"Parts of Speech tagging, is a process of marking the words in a text as corresponding to a particular part of speech, based on its definition and context POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and extraction. This paper discusses architecture for designing a Part-Of-Speech (POS tagger for Malayalam language using Conditional Random Field (CRF). The experiments presented in this paper use an annotated corpus of 1028 sentences (11,315 words) and tagset consists of 100 tags. A trigram based tagging scheme is involved in the experiments. The proposed system is based on an empirical approach that models the human POS tagging processing more realistically than the existing systems, without compromising the efficiency and accuracy.","PeriodicalId":388270,"journal":{"name":"2014 First International Conference on Computational Systems and Communications (ICCSC)","volume":"99 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 First International Conference on Computational Systems and Communications (ICCSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSC.2014.7032680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Parts of Speech tagging, is a process of marking the words in a text as corresponding to a particular part of speech, based on its definition and context POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and extraction. This paper discusses architecture for designing a Part-Of-Speech (POS tagger for Malayalam language using Conditional Random Field (CRF). The experiments presented in this paper use an annotated corpus of 1028 sentences (11,315 words) and tagset consists of 100 tags. A trigram based tagging scheme is involved in the experiments. The proposed system is based on an empirical approach that models the human POS tagging processing more realistically than the existing systems, without compromising the efficiency and accuracy.