{"title":"Fine-grained Emotion Classification: Class Imbalance Effects on Classifier Performance","authors":"Jasy Liew Suet Yan, Howard R. Turtle","doi":"10.1109/ICCOINS49721.2021.9497181","DOIUrl":null,"url":null,"abstract":"We explore a set of machine learning experiments in fine-grained emotion classification to test different proportion of positive and negative samples in the training data with the goal to examine if class imbalance affects classifier performance. The class distribution in a tweet corpus (EmoTweet-28) labelled with 28 emotion categories varies significantly with the largest category (happiness) occurring 11.5% and the smallest category occurring only 0.2%. For each emotion category, there are far more negative examples than positive examples. Unlike conventional wisdom, downsampling the data in our skewed corpus did not improve classifier performance. However, we found that increasing the negative examples in the training data leads to lower recall but higher precision. Demonstrating how the ratio of positive and negative examples in the training data affect the performance of emotion classifiers is the main contribution of this study.","PeriodicalId":245662,"journal":{"name":"2021 International Conference on Computer & Information Sciences (ICCOINS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer & Information Sciences (ICCOINS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCOINS49721.2021.9497181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We explore a set of machine learning experiments in fine-grained emotion classification to test different proportion of positive and negative samples in the training data with the goal to examine if class imbalance affects classifier performance. The class distribution in a tweet corpus (EmoTweet-28) labelled with 28 emotion categories varies significantly with the largest category (happiness) occurring 11.5% and the smallest category occurring only 0.2%. For each emotion category, there are far more negative examples than positive examples. Unlike conventional wisdom, downsampling the data in our skewed corpus did not improve classifier performance. However, we found that increasing the negative examples in the training data leads to lower recall but higher precision. Demonstrating how the ratio of positive and negative examples in the training data affect the performance of emotion classifiers is the main contribution of this study.