Kushankur Ghosh, Arghasree Banerjee, Sankhadeep Chatterjee, S. Sen
{"title":"Imbalanced Twitter Sentiment Analysis using Minority Oversampling","authors":"Kushankur Ghosh, Arghasree Banerjee, Sankhadeep Chatterjee, S. Sen","doi":"10.1109/ICAwST.2019.8923218","DOIUrl":null,"url":null,"abstract":"Micro-Blogging platforms have become one of the popular medium which reflects opinion/sentiment of social events and entities. Machine learning based sentiment analyses have been proven to be successful in finding people’s opinion using redundantly available data. However, current study has pointed out that the data being used to train such machine learning models could be highly imbalanced. In the current study live tweets from Twitter have been used to systematically study the effect of class imbalance problem in sentiment analysis. Minority oversampling method is employed here to manage the imbalanced class problem. Two well-known classifiers Support Vector Machine and Multinomial Naïve Bayes have been used for classifying tweets into positive or negative sentiment classes. Results have revealed that minority oversampling based methods can overcome the imbalanced class problem to a greater extent.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Micro-Blogging platforms have become one of the popular medium which reflects opinion/sentiment of social events and entities. Machine learning based sentiment analyses have been proven to be successful in finding people’s opinion using redundantly available data. However, current study has pointed out that the data being used to train such machine learning models could be highly imbalanced. In the current study live tweets from Twitter have been used to systematically study the effect of class imbalance problem in sentiment analysis. Minority oversampling method is employed here to manage the imbalanced class problem. Two well-known classifiers Support Vector Machine and Multinomial Naïve Bayes have been used for classifying tweets into positive or negative sentiment classes. Results have revealed that minority oversampling based methods can overcome the imbalanced class problem to a greater extent.