Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study

IF 1.3 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Foundations of Computing and Decision Sciences Pub Date : 2019-06-01 DOI:10.2478/fcds-2019-0009

Mateusz Lango

引用次数: 18

Abstract

Abstract Sentiment classification is an important task which gained extensive attention both in academia and in industry. Many issues related to this task such as handling of negation or of sarcastic utterances were analyzed and accordingly addressed in previous works. However, the issue of class imbalance which often compromises the prediction capabilities of learning algorithms was scarcely studied. In this work, we aim to bridge the gap between imbalanced learning and sentiment analysis. An experimental study including twelve imbalanced learning preprocessing methods, four feature representations, and a dozen of datasets, is carried out in order to analyze the usefulness of imbalanced learning methods for sentiment classification. Moreover, the data difficulty factors — commonly studied in imbalanced learning — are investigated on sentiment corpora to evaluate the impact of class imbalance.

查看原文本刊更多论文

解决多类情感分类中类不平衡问题的实验研究

摘要情感分类是学术界和工业界广泛关注的一项重要任务。与此相关的许多问题，如否定或讽刺话语的处理，在以前的作品中都进行了分析和相应的处理。然而，类不平衡问题往往会损害学习算法的预测能力，这一问题几乎没有得到研究。在这项工作中，我们的目标是弥合不平衡学习和情绪分析之间的差距。为了分析不平衡学习方法在情感分类中的有用性，我们进行了一项实验研究，包括12种不平衡学习预处理方法、4种特征表示和十几个数据集。此外，在情感语料库中调查了不平衡学习中常见的数据困难因素，以评估阶级不平衡的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊