基于语言特征的愤怒和预期分类使用机器学习

K. Ramakrishnan, Vimala Balakrishnan, Kumanan Govaichelvan
{"title":"基于语言特征的愤怒和预期分类使用机器学习","authors":"K. Ramakrishnan, Vimala Balakrishnan, Kumanan Govaichelvan","doi":"10.5220/0011289300003277","DOIUrl":null,"url":null,"abstract":"Growing number of online discourses enables the development of emotion mining models using natural language processing techniques. However, language diversity and cultural disparity alters the sentiment orientation of words depending on the community and context. Therefore, this study investigates the impacts of linguistic features, namely lexical and syntactic, in predicting the presence two emotions among Malaysian YouTube users, anger and anticipation. Term Frequency-Inverse Document Frequency (TF-IDF), Unigrams, Bigrams and Parts-of-Speech Tags were used as features to observe the classification performance. The dataset used in this study contains 2500 YouTube comments by Malaysian users on 46 Covid-19 related videos. Comments were extracted from three prominent Malaysian-centric English news channels: Channel News Asia (CNA), The Star News, and New Strait Times, ranging from 16 March 2020 - 30 April 2020 (i.e., first lockdown phase). Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, K-Nearest Neighbour and Multinomial Naive Bayes were the six classification algorithms tested, with results indicating Support Vector Machine with TF-IDF provided the best performance, achieving accuracy of 76% and 73% for anger and anticipation, respectively.","PeriodicalId":88612,"journal":{"name":"News. Phi Delta Epsilon","volume":"46 1","pages":"140-147"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Linguistic Feature-based Classification for Anger and Anticipation using Machine Learning\",\"authors\":\"K. Ramakrishnan, Vimala Balakrishnan, Kumanan Govaichelvan\",\"doi\":\"10.5220/0011289300003277\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Growing number of online discourses enables the development of emotion mining models using natural language processing techniques. However, language diversity and cultural disparity alters the sentiment orientation of words depending on the community and context. Therefore, this study investigates the impacts of linguistic features, namely lexical and syntactic, in predicting the presence two emotions among Malaysian YouTube users, anger and anticipation. Term Frequency-Inverse Document Frequency (TF-IDF), Unigrams, Bigrams and Parts-of-Speech Tags were used as features to observe the classification performance. The dataset used in this study contains 2500 YouTube comments by Malaysian users on 46 Covid-19 related videos. Comments were extracted from three prominent Malaysian-centric English news channels: Channel News Asia (CNA), The Star News, and New Strait Times, ranging from 16 March 2020 - 30 April 2020 (i.e., first lockdown phase). Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, K-Nearest Neighbour and Multinomial Naive Bayes were the six classification algorithms tested, with results indicating Support Vector Machine with TF-IDF provided the best performance, achieving accuracy of 76% and 73% for anger and anticipation, respectively.\",\"PeriodicalId\":88612,\"journal\":{\"name\":\"News. Phi Delta Epsilon\",\"volume\":\"46 1\",\"pages\":\"140-147\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"News. Phi Delta Epsilon\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0011289300003277\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"News. Phi Delta Epsilon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0011289300003277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

越来越多的在线话语使得使用自然语言处理技术的情感挖掘模型得以发展。然而,语言的多样性和文化的差异会根据社区和语境的不同而改变词语的情感取向。因此,本研究考察了语言特征(即词汇和句法)在预测马来西亚YouTube用户愤怒和期待两种情绪存在方面的影响。使用词频-逆文档频率(TF-IDF)、单图、双图和词性标签作为特征来观察分类性能。本研究中使用的数据集包含马来西亚用户对46个Covid-19相关视频的2500条YouTube评论。评论摘自三个以马来西亚为中心的著名英语新闻频道:亚洲新闻频道(CNA)、《星报》和《新海峡时报》,时间为2020年3月16日至2020年4月30日(即第一封锁阶段)。随机森林、支持向量机、逻辑回归、决策树、k近邻和多项式朴素贝叶斯是测试的六种分类算法,结果表明支持向量机与TF-IDF提供了最好的性能,在愤怒和预期方面分别达到76%和73%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Linguistic Feature-based Classification for Anger and Anticipation using Machine Learning
Growing number of online discourses enables the development of emotion mining models using natural language processing techniques. However, language diversity and cultural disparity alters the sentiment orientation of words depending on the community and context. Therefore, this study investigates the impacts of linguistic features, namely lexical and syntactic, in predicting the presence two emotions among Malaysian YouTube users, anger and anticipation. Term Frequency-Inverse Document Frequency (TF-IDF), Unigrams, Bigrams and Parts-of-Speech Tags were used as features to observe the classification performance. The dataset used in this study contains 2500 YouTube comments by Malaysian users on 46 Covid-19 related videos. Comments were extracted from three prominent Malaysian-centric English news channels: Channel News Asia (CNA), The Star News, and New Strait Times, ranging from 16 March 2020 - 30 April 2020 (i.e., first lockdown phase). Random Forest, Support Vector Machine, Logistic Regression, Decision Tree, K-Nearest Neighbour and Multinomial Naive Bayes were the six classification algorithms tested, with results indicating Support Vector Machine with TF-IDF provided the best performance, achieving accuracy of 76% and 73% for anger and anticipation, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信