基于文本挖掘和概率语言建模的在线评论垃圾邮件检测

ACM Trans. Manag. Inf. Syst. Pub Date : 2012-01-05 DOI:10.1145/2070710.2070716

Raymond Y. K. Lau, S. Liao, R. Kwok, Kaiquan Xu, Yunqing Xia, Yuefeng Li

{"title":"基于文本挖掘和概率语言建模的在线评论垃圾邮件检测","authors":"Raymond Y. K. Lau, S. Liao, R. Kwok, Kaiquan Xu, Yunqing Xia, Yuefeng Li","doi":"10.1145/2070710.2070716","DOIUrl":null,"url":null,"abstract":"In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.","PeriodicalId":178565,"journal":{"name":"ACM Trans. Manag. Inf. Syst.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"207","resultStr":"{\"title\":\"Text mining and probabilistic language modeling for online review spam detection\",\"authors\":\"Raymond Y. K. Lau, S. Liao, R. Kwok, Kaiquan Xu, Yunqing Xia, Yuefeng Li\",\"doi\":\"10.1145/2070710.2070716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.\",\"PeriodicalId\":178565,\"journal\":{\"name\":\"ACM Trans. Manag. Inf. Syst.\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"207\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Manag. Inf. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2070710.2070716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Manag. Inf. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2070710.2070716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 207

摘要

在Web 2.0时代，每天都有大量的消费者评论发布到互联网上。由于信息过载的问题，人工检测和分析虚假评论(即垃圾邮件)的方法是不切实际的。然而，设计和开发检测虚假评论的自动化方法是一个具有挑战性的研究问题。主要原因是，虚假评论是专门用来误导读者的，所以它们可能看起来和合法评论(即ham)一样。因此，将个别评论分类为垃圾邮件或火腿的歧视性功能可能无法使用。在设计科学研究方法的指导下，本研究的主要贡献是设计和实例化用于检测虚假评论的新型计算模型。特别是，开发了一种新的文本挖掘模型，并将其集成到语义语言模型中，用于检测不真实评论。然后根据从亚马逊网站收集的真实世界数据集对这些模型进行评估。我们的实验结果证实，我们提出的模型在检测虚假评论方面优于其他知名的基线模型。据我们所知，本文中讨论的工作代表了将文本挖掘方法和语义语言模型应用于虚假消费者评论检测的首次成功尝试。我们研究的一个管理意义是，公司可以应用我们的设计工件来监控在线消费者评论，从而根据发布到互联网上的真实消费者反馈制定有效的营销或产品设计策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text mining and probabilistic language modeling for online review spam detection

In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Manag. Inf. Syst.

自引率

0.00%

发文量