基于潜在Dirichlet分配和机器学习混合方法的大规模不平衡Twitter数据情感分析

Nasir Jamal, Xianqiao Chen, Junaid Hussain Abro, Doniyor Tukhtakhunov
{"title":"基于潜在Dirichlet分配和机器学习混合方法的大规模不平衡Twitter数据情感分析","authors":"Nasir Jamal, Xianqiao Chen, Junaid Hussain Abro, Doniyor Tukhtakhunov","doi":"10.1145/3446132.3446413","DOIUrl":null,"url":null,"abstract":"Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Sentimental Analysis based on hybrid approach of Latent Dirichlet Allocation and Machine Learning for Large-Scale of Imbalanced Twitter Data\",\"authors\":\"Nasir Jamal, Xianqiao Chen, Junaid Hussain Abro, Doniyor Tukhtakhunov\",\"doi\":\"10.1145/3446132.3446413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.\",\"PeriodicalId\":125388,\"journal\":{\"name\":\"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3446132.3446413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446132.3446413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在大量的Twitter数据中进行情绪分类,对于分析用户对关注的产品、新闻、话题等的情绪是非常有效的。然而,从大量原始tweet中提取有意义的特征确实是一项具有挑战性的任务,因为情绪是主观的,具有有限的模糊界限。这些主观特征可以用不同的术语和感知来表达。在本文中,我们提出了一种LDA和机器学习的混合方法来预测大规模不平衡推文的情绪。首先,使用标记化方法对原始tweet进行预处理,以捕获无噪声信息的有用特征。其次,利用TFIDF统计技术估计局部和全局特征的重要性。第三,利用潜狄利克雷分配(Latent Dirichlet Allocation, LDA)主题建模方法从这些特征中提取主题。这些主题解释了相关tweet的概念,这对分类非常有帮助。第四,采用自适应合成(ADASYN)类平衡技术对数据进行过采样,平衡各类主题。最后,应用k -最近邻(KNN)机器学习算法对提取的主题进行情绪预测。班级平衡法提高了辅修班级的重要性,解决了班级失衡问题。该方法在两个不同的twitter情绪数据集上进行了评估。事实证明,该方法在精度,召回率,f-measure和分类精度方面优于流行的最新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sentimental Analysis based on hybrid approach of Latent Dirichlet Allocation and Machine Learning for Large-Scale of Imbalanced Twitter Data
Emotions classification in large amount of Twitter's data is very effective to analyze the users’ mood about a concerned product, news, topic, and so on. However, it is really a challenging task to extract meaningful features from a burst of raw tweets as emotions are subjective with limited fuzzy boundaries. These subjective features can be expressed in different terminologies and perceptions. In this paper, we proposed a hybrid approach of LDA and machine learning to predict emotions for large scale of imbalanced tweets. First, the raw tweets are preprocessed using tokenization method for capturing useful features without noisy information. Second, the local and global feature's importance is estimated by applying TFIDF statistical technique. Third, the Latent Dirichlet Allocation (LDA) topic modeling method is used to extract topics from these features. These topics explain concepts of related tweet which is really helpful for classification. Fourth, the Adaptive Synthetic (ADASYN) class balancing technique is applied to oversample the data and balance each class of topic. Finally, the K-Nearest Neighbor (KNN) machine learning algorithm is applied to predict the emotions in extracted topics. The class balancing method increase the significance of minor classes and solve the problem of class imbalance. The proposed approach is evaluated on two different Twitters’ emotions datasets. It is proved that, this methodology outperformed as compared to the popular state of the art methods in terms of precision, recall, f-measure and classification accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信