{"title":"基于概率与神经网络的尼泊尔语歧义文本词性标注比较研究","authors":"A. Pradhan, A. Yajnik","doi":"10.1145/3459104.3459146","DOIUrl":null,"url":null,"abstract":"There are various approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. This article presents a comprehensive study and comparison of two different techniques of Part-of-Speech (POS) Tagging for Nepali text viz. Hidden Markov Model (HMM) and General Regression Neural Network (GRNN) based. The POS taggers resolves the problem of ambiguity in POS tagging of Nepali text through two different approaches. The evaluation of the taggers are done on the corpora developed and provided by TDIL (Technology Development for Indian Languages). Apart from corpora, python and Java programming languages and the NLTK Toolkit library has been used for implementation. Both the tagger achieves accuracy of 100 percent for known words (with no ambiguity), 58.29 percent (HMM) and 60.45 percent (GRNN) for ambiguous words and 85.36 percent (GRNN) for non- ambiguous unknown words.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Probabilistic and Neural Network Based POS Tagging of Ambiguous Nepali text: A Comparative Study\",\"authors\":\"A. Pradhan, A. Yajnik\",\"doi\":\"10.1145/3459104.3459146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are various approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. This article presents a comprehensive study and comparison of two different techniques of Part-of-Speech (POS) Tagging for Nepali text viz. Hidden Markov Model (HMM) and General Regression Neural Network (GRNN) based. The POS taggers resolves the problem of ambiguity in POS tagging of Nepali text through two different approaches. The evaluation of the taggers are done on the corpora developed and provided by TDIL (Technology Development for Indian Languages). Apart from corpora, python and Java programming languages and the NLTK Toolkit library has been used for implementation. Both the tagger achieves accuracy of 100 percent for known words (with no ambiguity), 58.29 percent (HMM) and 60.45 percent (GRNN) for ambiguous words and 85.36 percent (GRNN) for non- ambiguous unknown words.\",\"PeriodicalId\":142284,\"journal\":{\"name\":\"2021 International Symposium on Electrical, Electronics and Information Engineering\",\"volume\":\"156 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Symposium on Electrical, Electronics and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459104.3459146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Probabilistic and Neural Network Based POS Tagging of Ambiguous Nepali text: A Comparative Study
There are various approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. This article presents a comprehensive study and comparison of two different techniques of Part-of-Speech (POS) Tagging for Nepali text viz. Hidden Markov Model (HMM) and General Regression Neural Network (GRNN) based. The POS taggers resolves the problem of ambiguity in POS tagging of Nepali text through two different approaches. The evaluation of the taggers are done on the corpora developed and provided by TDIL (Technology Development for Indian Languages). Apart from corpora, python and Java programming languages and the NLTK Toolkit library has been used for implementation. Both the tagger achieves accuracy of 100 percent for known words (with no ambiguity), 58.29 percent (HMM) and 60.45 percent (GRNN) for ambiguous words and 85.36 percent (GRNN) for non- ambiguous unknown words.