Supervised term weighting for sentiment analysis

Tam T. Nguyen, Kuiyu Chang, S. Hui
{"title":"Supervised term weighting for sentiment analysis","authors":"Tam T. Nguyen, Kuiyu Chang, S. Hui","doi":"10.1109/ISI.2011.5984056","DOIUrl":null,"url":null,"abstract":"Vector space text classification is commonly used in intelligence applications such as email and conversation analysis. In this paper we propose a supervised term weighting scheme called tƒ × KL (term frequency Kullback-Leibler), which weights each word proportionally to the ratio of its document frequency across the positive and negative class. We then generalize tƒ × KL to effectively deal with class imbalance, which is very common in real world intelligence analysis. The generalized tƒ × KL weights each word according to the ratio of the positive and negative class conditioned word probabilities instead of the raw document frequencies. Results on four classification datasets show tƒ × KL to perform consistently better than the baseline tƒ ×idƒ and 4 other supervised term weighting schemes, including the recently proposed tƒ × rƒ (term frequency relevance frequency). The generalized tƒ × KL was found to be extremely robust in dealing with highly skewed class distributions, beating the second runner-up by more than 20% on a dataset that has only 10% positive training examples. The generalized tƒ × KL is thus an effective and robust term weighting scheme that can significantly improve binary classification performance in sentiment analysis and intelligence applications.","PeriodicalId":220165,"journal":{"name":"Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics","volume":"61 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2011.5984056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Vector space text classification is commonly used in intelligence applications such as email and conversation analysis. In this paper we propose a supervised term weighting scheme called tƒ × KL (term frequency Kullback-Leibler), which weights each word proportionally to the ratio of its document frequency across the positive and negative class. We then generalize tƒ × KL to effectively deal with class imbalance, which is very common in real world intelligence analysis. The generalized tƒ × KL weights each word according to the ratio of the positive and negative class conditioned word probabilities instead of the raw document frequencies. Results on four classification datasets show tƒ × KL to perform consistently better than the baseline tƒ ×idƒ and 4 other supervised term weighting schemes, including the recently proposed tƒ × rƒ (term frequency relevance frequency). The generalized tƒ × KL was found to be extremely robust in dealing with highly skewed class distributions, beating the second runner-up by more than 20% on a dataset that has only 10% positive training examples. The generalized tƒ × KL is thus an effective and robust term weighting scheme that can significantly improve binary classification performance in sentiment analysis and intelligence applications.
情感分析的监督术语加权
向量空间文本分类通常用于智能应用,如电子邮件和会话分析。在本文中,我们提出了一种称为tf × KL(术语频率Kullback-Leibler)的监督术语加权方案,该方案根据每个单词在正负类中的文档频率比例对其进行加权。然后我们推广tf × KL来有效地处理类不平衡,这在现实世界的智能分析中很常见。广义tf × KL根据正负类条件词概率的比值而不是原始文档频率对每个词进行加权。在四个分类数据集上的结果显示,tf × KL的表现始终优于基线tf ×idƒ和其他4种监督术语加权方案,包括最近提出的tf × rf(术语频率相关频率)。我们发现广义的tf × KL在处理高度倾斜的类分布方面非常稳健,在只有10%的正训练样本的数据集上,它比第二名高出20%以上。因此,广义的tf × KL是一种有效且鲁棒的术语加权方案,可以显著提高情感分析和智能应用中的二元分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信