Solving unbalanced data for Thai sentiment analysis

Warunya Wunnasri, T. Theeramunkong, C. Haruechaiyasak
{"title":"Solving unbalanced data for Thai sentiment analysis","authors":"Warunya Wunnasri, T. Theeramunkong, C. Haruechaiyasak","doi":"10.1109/JCSSE.2013.6567345","DOIUrl":null,"url":null,"abstract":"Growth of microblogging “Twitter” is dramatic among online users in Thailand. Communication on Twitter is very lively and up-to-date since users Users often express their feelings and sentiments in Twitter posts related to current topics or new growing topic. While sentiment analysis on Twitter has challenges in language related issues, such as short-length message and word usage variation, it also faces the problem of unbalanced class problem. In Twitter, people tend to make complaints more than admirations. In this paper, we propose a sampling-based method to solve data unbalanceness in Twitter sentiment analysis in Thai. Three types of sampling methods, called random, largest complete-link sampling, and largest average-link sampling are produced as preprocess before k-NN classifier. From the experimental results, the largest average-linkage sampling achieves the highest performance with the macro average F-measure of 0.57 comparing to the unbalance case.","PeriodicalId":199516,"journal":{"name":"The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2013.6567345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Growth of microblogging “Twitter” is dramatic among online users in Thailand. Communication on Twitter is very lively and up-to-date since users Users often express their feelings and sentiments in Twitter posts related to current topics or new growing topic. While sentiment analysis on Twitter has challenges in language related issues, such as short-length message and word usage variation, it also faces the problem of unbalanced class problem. In Twitter, people tend to make complaints more than admirations. In this paper, we propose a sampling-based method to solve data unbalanceness in Twitter sentiment analysis in Thai. Three types of sampling methods, called random, largest complete-link sampling, and largest average-link sampling are produced as preprocess before k-NN classifier. From the experimental results, the largest average-linkage sampling achieves the highest performance with the macro average F-measure of 0.57 comparing to the unbalance case.
解决泰国情绪分析的不平衡数据
微博客“Twitter”在泰国网络用户中迅速发展。Twitter上的交流是非常活跃和最新的,因为用户用户经常在Twitter上发表与当前话题或新话题相关的帖子来表达他们的感受和观点。Twitter上的情感分析在语言相关问题上存在挑战,比如短消息和词性变化,同时也面临着不平衡的阶级问题。在Twitter上,人们更倾向于抱怨而不是赞美。在本文中,我们提出了一种基于抽样的方法来解决泰语Twitter情感分析中的数据不平衡问题。在k-NN分类器之前,产生了随机采样、最大完整链路采样和最大平均链路采样三种采样方法作为预处理。从实验结果来看,与不平衡情况相比,最大平均联动抽样的宏观平均f测度为0.57,达到了最高的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信