{"title":"A lexicon pool augmented Naive Bayes Classifier for Nepali Text","authors":"S. Thakur, V. Singh","doi":"10.1109/IC3.2014.6897231","DOIUrl":null,"url":null,"abstract":"This paper presents our experimental work on machine classification of Nepali texts. We have implemented a Naive Bayes classifier for the task and then augmented it through a multinomial lexicon pooling. The lexicon-pooled Naive Bayes Classifier obtains better results on classification task as compared to a normal Naive Bayes implementation. This hybrid approach also helps in dealing with the unavailability of linguistic resources in Nepali (such as stemmer, stop word list and accurate POS tagger). The proposed lexicon-pooled Naive Bayes approach is evaluated by applying on a sufficiently large dataset of Nepalese news stories. The experimental results demonstrate the higher classification accuracy and usefulness of the method for Nepali text classification. The paper also contributes resources to Nepali language processing, in form of a Nepali news stories corpus and a domain specific lexicon for Nepali news stories.","PeriodicalId":444918,"journal":{"name":"2014 Seventh International Conference on Contemporary Computing (IC3)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Seventh International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2014.6897231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
This paper presents our experimental work on machine classification of Nepali texts. We have implemented a Naive Bayes classifier for the task and then augmented it through a multinomial lexicon pooling. The lexicon-pooled Naive Bayes Classifier obtains better results on classification task as compared to a normal Naive Bayes implementation. This hybrid approach also helps in dealing with the unavailability of linguistic resources in Nepali (such as stemmer, stop word list and accurate POS tagger). The proposed lexicon-pooled Naive Bayes approach is evaluated by applying on a sufficiently large dataset of Nepalese news stories. The experimental results demonstrate the higher classification accuracy and usefulness of the method for Nepali text classification. The paper also contributes resources to Nepali language processing, in form of a Nepali news stories corpus and a domain specific lexicon for Nepali news stories.