Text emotion mining on Twitter

IOP SciNotes Pub Date : 2020-10-23 DOI:10.1088/2633-1357/abc01e

Suboh Alkhushayni, Daniel C Zellmer, Ryan J DeBusk, Du’a Al-zaleq

引用次数: 2

Abstract

Twitter has become a medium through which a substantial percentage of the global population communicates their feelings and reactions to current events. Emotion mining from text aims to capture these emotions by using a series of algorithms to evaluate the contents of each tweet. In this study, tweets that expressed at least one of seven basic emotions were collected. The resulting dataset was a corpus of 42,000 tweets with a balanced presence of each emotion. From this corpus a lexicon of roughly 40,000 words, each associated with a weighted vector corresponding to one of the emotions, was created. Next, different methods of identifying emotion in these ‘cleaned’ tweets were performed and evaluated. These methods included both lexically-based classification and supervised machine learning-based classification. Finally, an ensemble method involving several multi-class classifiers trained on unigram features of the lexicon was evaluated. This evaluation revealed that the ensemble method outperformed all other tested methods when tested on existing datasets as well as on the dataset created for this study.

查看原文本刊更多论文

Twitter上的文本情感挖掘

推特已经成为一种媒介，全球很大一部分人通过它来交流他们对当前事件的感受和反应。从文本中挖掘情感的目的是通过使用一系列算法来评估每条推文的内容，从而捕捉这些情感。在这项研究中，收集了至少表达七种基本情绪之一的推文。由此产生的数据集是一个包含42000条推文的语料库，每种情绪的存在都是平衡的。从这个语料库中创建了大约40,000个单词的词典，每个单词都与对应于一种情绪的加权向量相关联。接下来，对这些“清理过的”推文中识别情绪的不同方法进行了执行和评估。这些方法包括基于词汇的分类和基于监督机器学习的分类。最后，对基于单图特征训练的多类分类器集成方法进行了评价。该评估表明，当在现有数据集以及为本研究创建的数据集上进行测试时，集成方法优于所有其他测试方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IOP SciNotes

自引率

0.00%

发文量

审稿时长

14 weeks