Nitish Ranjan Bhowmik, M. Arifuzzaman, M. Mondal, Md. Saiful Islam
{"title":"Bangla Text Sentiment Analysis Using Supervised Machine Learning with Extended Lexicon Dictionary","authors":"Nitish Ranjan Bhowmik, M. Arifuzzaman, M. Mondal, Md. Saiful Islam","doi":"10.2991/NLPR.D.210316.001","DOIUrl":null,"url":null,"abstract":"WiththeproliferationoftheInternet’ssocialdigitalcontent,sentimentanalysis(SA)hasgainedawideresearchinterestinnatural language processing (NLP). A few significant research has been done in Bangla language domain because of having intricate grammatical structure on text. This paper focuses on SA in the context of Bangla language. Firstly, a specific domain-based categorical weighted lexicon data dictionary (LDD) is developed for analyzing sentiments in Bangla. This LDD is developed by applying the concepts of normalization, tokenization, and stemming to two Bangla datasets available in GitHub repository. Secondly, a novel rule–based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed for detecting sentence polarity. This algorithm considers parts of speech tagger words and special characters to generate a score of a word and thus that ofasentenceandablog.TheBTSCalgorithmalongwiththeLDDisappliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeaturematricesaredevelopedbyapplyingtermfrequency-inversedocumentfrequency(tf-idf)to thetwodatasets,andbyusingthecorrespondingBTSCscores.Next,supervisedmachinelearningclassifiersareappliedtothefeaturematrices","PeriodicalId":332352,"journal":{"name":"Natural Language Processing Research","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/NLPR.D.210316.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
WiththeproliferationoftheInternet’ssocialdigitalcontent,sentimentanalysis(SA)hasgainedawideresearchinterestinnatural language processing (NLP). A few significant research has been done in Bangla language domain because of having intricate grammatical structure on text. This paper focuses on SA in the context of Bangla language. Firstly, a specific domain-based categorical weighted lexicon data dictionary (LDD) is developed for analyzing sentiments in Bangla. This LDD is developed by applying the concepts of normalization, tokenization, and stemming to two Bangla datasets available in GitHub repository. Secondly, a novel rule–based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed for detecting sentence polarity. This algorithm considers parts of speech tagger words and special characters to generate a score of a word and thus that ofasentenceandablog.TheBTSCalgorithmalongwiththeLDDisappliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeaturematricesaredevelopedbyapplyingtermfrequency-inversedocumentfrequency(tf-idf)to thetwodatasets,andbyusingthecorrespondingBTSCscores.Next,supervisedmachinelearningclassifiersareappliedtothefeaturematrices