Anirban Mukherjee, S. Mukhopadhyay, P. Panigrahi, Saptarsi Goswami
{"title":"Utilization of Oversampling for multiclass sentiment analysis on Amazon Review Dataset","authors":"Anirban Mukherjee, S. Mukhopadhyay, P. Panigrahi, Saptarsi Goswami","doi":"10.1109/ICAwST.2019.8923260","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis is a major element in Artificial Intelligence. Its applications include machine translation, text analysis, computational linguistics, etc. In most cases, classification of sentiment is done into two or three classes. But in some situations, for example rating a product from Amazon, there are multiple classes. One major challenge in such tasks is the class imbalance which reduces the accuracy by making the model biased. To deal with this problem, we use oversampling to reduce the class imbalance of the dataset before training the model. In this research work, first we use variations of recurrent neural networks, such as simple RNN, GRU, LSTM and Bidirectional LSTM, to find out which model performs the best in multi-class classification of sentiment. Then, we use that model to understand the effect of oversampling a dataset before using it to train a model.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Sentiment Analysis is a major element in Artificial Intelligence. Its applications include machine translation, text analysis, computational linguistics, etc. In most cases, classification of sentiment is done into two or three classes. But in some situations, for example rating a product from Amazon, there are multiple classes. One major challenge in such tasks is the class imbalance which reduces the accuracy by making the model biased. To deal with this problem, we use oversampling to reduce the class imbalance of the dataset before training the model. In this research work, first we use variations of recurrent neural networks, such as simple RNN, GRU, LSTM and Bidirectional LSTM, to find out which model performs the best in multi-class classification of sentiment. Then, we use that model to understand the effect of oversampling a dataset before using it to train a model.