{"title":"Enhancing the Performance of POS based Features using Generalization for Sentiment Classification","authors":"K. Kalaivani, C. Kanimozhiselvi, V. Rajasekar","doi":"10.1109/ICCMC53470.2022.9754079","DOIUrl":null,"url":null,"abstract":"The task of evaluating the polarity of an opinionated text as positive or negative is known as sentiment classification. Companies nowadays are interested in learning how their consumers feel about their products by studying their views on review pages, blogs, tweets, discussion boards and web portals. Politicians and governments are also interested in sentiment classification for defining campaign plans and policies. The aim of this work is to use Part of Speech (POS) based knowledge in a machine learning approach to decide whether an opinionated document is positive or negative. In order to have a more effective feature space and to reduce the sparsity of the feature vector, generalization of bigrams is done by backing-off the first word or the second word to their respective POS cluster. Experiments conducted show that the use of combined POS features of unigrams and generalized bigrams outperform other features in terms of accuracy using Multinomial Naive Bayes (MNB) classifier.","PeriodicalId":345346,"journal":{"name":"2022 6th International Conference on Computing Methodologies and Communication (ICCMC)","volume":"R-33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC53470.2022.9754079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The task of evaluating the polarity of an opinionated text as positive or negative is known as sentiment classification. Companies nowadays are interested in learning how their consumers feel about their products by studying their views on review pages, blogs, tweets, discussion boards and web portals. Politicians and governments are also interested in sentiment classification for defining campaign plans and policies. The aim of this work is to use Part of Speech (POS) based knowledge in a machine learning approach to decide whether an opinionated document is positive or negative. In order to have a more effective feature space and to reduce the sparsity of the feature vector, generalization of bigrams is done by backing-off the first word or the second word to their respective POS cluster. Experiments conducted show that the use of combined POS features of unigrams and generalized bigrams outperform other features in terms of accuracy using Multinomial Naive Bayes (MNB) classifier.