Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions

A. Siagian, M. Aritsugi
{"title":"Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions","authors":"A. Siagian, M. Aritsugi","doi":"10.1145/3349536","DOIUrl":null,"url":null,"abstract":"Opinions in reviews about the quality of products or services can be important information for readers. Unfortunately, such opinions may include deceptive ones posted for some business reasons. To keep the opinions as a valuable and trusted source of information, we propose an approach to detecting deceptive and truthful opinions. Specifically, we explore the use of word and character n-gram combinations, function words, and word syntactic n-grams (word sn-grams) as features for classifiers to deal with this task. We also consider applying word correction to our utilized dataset. Our experiments show that classification results of using the word and character n-gram combination features could perform better than those of employing other features. Although the experiments indicate that applying the word correction might be insignificant, we note that the deceptive opinions tend to have a smaller number of error words than the truthful ones. To examine robustness of our features, we then perform cross-classification tests. Our latter experiments results suggest that using the word and character n-gram combination features could work well in detecting deceptive and truthful opinions. Interestingly, the latter experimental results also indicate that using the word sn-grams as combination features could give good performance.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"23 1","pages":"1 - 24"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3349536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Opinions in reviews about the quality of products or services can be important information for readers. Unfortunately, such opinions may include deceptive ones posted for some business reasons. To keep the opinions as a valuable and trusted source of information, we propose an approach to detecting deceptive and truthful opinions. Specifically, we explore the use of word and character n-gram combinations, function words, and word syntactic n-grams (word sn-grams) as features for classifiers to deal with this task. We also consider applying word correction to our utilized dataset. Our experiments show that classification results of using the word and character n-gram combination features could perform better than those of employing other features. Although the experiments indicate that applying the word correction might be insignificant, we note that the deceptive opinions tend to have a smaller number of error words than the truthful ones. To examine robustness of our features, we then perform cross-classification tests. Our latter experiments results suggest that using the word and character n-gram combination features could work well in detecting deceptive and truthful opinions. Interestingly, the latter experimental results also indicate that using the word sn-grams as combination features could give good performance.
单词和字符n图组合在检测欺骗性和真实意见中的鲁棒性
评论中关于产品或服务质量的意见对读者来说可能是重要的信息。不幸的是,这些观点可能包括出于某些商业原因而发布的欺骗性观点。为了使这些意见成为有价值和可信的信息来源,我们提出了一种检测欺骗性和真实意见的方法。具体来说,我们探索了使用单词和字符n-gram组合、虚词和单词语法n-gram(单词n-gram)作为分类器处理此任务的特征。我们还考虑对我们使用的数据集应用单词校正。我们的实验表明,使用单词和字符n图组合特征的分类结果优于使用其他特征的分类结果。虽然实验表明,使用单词更正可能是微不足道的,但我们注意到,欺骗性的观点往往比真实的观点有更少的错误词汇。为了检验我们的特征的稳健性,我们执行交叉分类测试。我们后来的实验结果表明,使用单词和字符n-gram组合特征可以很好地检测欺骗性和真实的意见。有趣的是,后者的实验结果也表明,使用单词sn-grams作为组合特征可以获得良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信