Classifying Positive or Negative Text Using Features Based on Opinion Words and Term Frequency - Inverse Document Frequency

Sasiporn Tongman, N. Wattanakitrungroj
{"title":"Classifying Positive or Negative Text Using Features Based on Opinion Words and Term Frequency - Inverse Document Frequency","authors":"Sasiporn Tongman, N. Wattanakitrungroj","doi":"10.1109/ICAICTA.2018.8541274","DOIUrl":null,"url":null,"abstract":"The contents in website and social networks are rapidly generated. The opinions and reviews can be analyzed and classified into two classes, positive or negative opinions, by machine learning methods. However, the main issue is how to representing each text as a proper set of variables, a p-feature vector, so that the successful classifiers can be obtained by one of the supervised learning approaches with its suitable parameter setting. In this study, a two-feature vector representing positive and negative moods in each text was prepared by using lists of positive and negative words, and then combined with term frequency - inverse document frequency (TF-IDF) features. kNN and SVM classifiers were comparatively built by this set and also other baseline set to predict each test vector and measure their effectiveness. Data of text Reviews from Yelp, Amazon and IMDB, were experimented with 10-fold cross validation in parameter variation and feature set reduction using PCA. The best Accuracy results across these three datasets, ~0.81-0.87, were yielded by SVM classifiers with each size of the reduced feature sets that is very smaller than the original size.","PeriodicalId":184882,"journal":{"name":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2018.8541274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The contents in website and social networks are rapidly generated. The opinions and reviews can be analyzed and classified into two classes, positive or negative opinions, by machine learning methods. However, the main issue is how to representing each text as a proper set of variables, a p-feature vector, so that the successful classifiers can be obtained by one of the supervised learning approaches with its suitable parameter setting. In this study, a two-feature vector representing positive and negative moods in each text was prepared by using lists of positive and negative words, and then combined with term frequency - inverse document frequency (TF-IDF) features. kNN and SVM classifiers were comparatively built by this set and also other baseline set to predict each test vector and measure their effectiveness. Data of text Reviews from Yelp, Amazon and IMDB, were experimented with 10-fold cross validation in parameter variation and feature set reduction using PCA. The best Accuracy results across these three datasets, ~0.81-0.87, were yielded by SVM classifiers with each size of the reduced feature sets that is very smaller than the original size.
基于意见词和词频的特征正反文本分类——逆文档频率
网站和社交网络中的内容生成迅速。通过机器学习方法,这些意见和评论可以被分析并分为两类,积极的或消极的意见。然而,主要问题是如何将每个文本表示为一组适当的变量,即p-特征向量,以便通过一种具有适当参数设置的监督学习方法获得成功的分类器。在本研究中,利用正负词列表,结合词频-逆文档频率(TF-IDF)特征,构建了代表文本中积极情绪和消极情绪的双特征向量。通过该集和其他基线集对比构建kNN和SVM分类器,预测每个测试向量并衡量其有效性。对来自Yelp、Amazon和IMDB的文本评论数据进行了10倍交叉验证,使用主成分分析法对参数变化和特征集约简进行了验证。在这三个数据集上,SVM分类器产生的最佳精度结果为~0.81-0.87,每个约简特征集的大小都比原始大小小得多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信