使用深度学习的虚假评论分类

Shahbaz Ashraf, Faisal Rehman, Hana Sharif, Hina Kim, Haseeb Arshad, Hamid Manzoor
{"title":"使用深度学习的虚假评论分类","authors":"Shahbaz Ashraf, Faisal Rehman, Hana Sharif, Hina Kim, Haseeb Arshad, Hamid Manzoor","doi":"10.1109/IMCERT57083.2023.10075156","DOIUrl":null,"url":null,"abstract":"Customer decisions are heavily influenced by online reviews. Scammers and spammers can now influence consumer behavior by spreading false information in the form of reviews, either by promoting nonexistent goods or by disparaging rival goods. This means that identifying bogus from genuine reviews is more crucial than ever. For text classification, the standard method employs a bag-of-words model to represent text, leading to sparsity and word representations learned from neural networks with poor capacity for handling unknown words. In this work, we offer a method that uses an ensemble of models built using an aggregation methodology to make predictions based on data from three individual models trained using a multi-view learning approach. Our technology is based around a central idea of using bag-of-n-grams in conjunction with parallel convolution neural networks to extract valuable information from review text (CNNs). With the same amount of computing needed to train deep and sophisticated CNNs, we can leverage local context with an n-gram embedding layer that has tiny kernel sizes. In order to better extract feature representations from text, our CNN-based architecture takes n-gram embeddings as input and processes them with concurrent convolutional blocks. In addition to including linguistic aspects of the review text and non-textual information associated with reviewer behavior, our method for identifying fraudulent reviews also considers reviewer activity. We test our method using the openly available Yelp Filtered Dataset, and get F1 scores as high as 92% for recognizing fraudulent reviews.","PeriodicalId":201596,"journal":{"name":"2023 International Multi-disciplinary Conference in Emerging Research Trends (IMCERT)","volume":"257 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fake Reviews Classification using Deep Learning\",\"authors\":\"Shahbaz Ashraf, Faisal Rehman, Hana Sharif, Hina Kim, Haseeb Arshad, Hamid Manzoor\",\"doi\":\"10.1109/IMCERT57083.2023.10075156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Customer decisions are heavily influenced by online reviews. Scammers and spammers can now influence consumer behavior by spreading false information in the form of reviews, either by promoting nonexistent goods or by disparaging rival goods. This means that identifying bogus from genuine reviews is more crucial than ever. For text classification, the standard method employs a bag-of-words model to represent text, leading to sparsity and word representations learned from neural networks with poor capacity for handling unknown words. In this work, we offer a method that uses an ensemble of models built using an aggregation methodology to make predictions based on data from three individual models trained using a multi-view learning approach. Our technology is based around a central idea of using bag-of-n-grams in conjunction with parallel convolution neural networks to extract valuable information from review text (CNNs). With the same amount of computing needed to train deep and sophisticated CNNs, we can leverage local context with an n-gram embedding layer that has tiny kernel sizes. In order to better extract feature representations from text, our CNN-based architecture takes n-gram embeddings as input and processes them with concurrent convolutional blocks. In addition to including linguistic aspects of the review text and non-textual information associated with reviewer behavior, our method for identifying fraudulent reviews also considers reviewer activity. We test our method using the openly available Yelp Filtered Dataset, and get F1 scores as high as 92% for recognizing fraudulent reviews.\",\"PeriodicalId\":201596,\"journal\":{\"name\":\"2023 International Multi-disciplinary Conference in Emerging Research Trends (IMCERT)\",\"volume\":\"257 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Multi-disciplinary Conference in Emerging Research Trends (IMCERT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCERT57083.2023.10075156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Multi-disciplinary Conference in Emerging Research Trends (IMCERT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCERT57083.2023.10075156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

顾客的决定很大程度上受到在线评论的影响。骗子和垃圾邮件制造者现在可以通过以评论的形式传播虚假信息来影响消费者的行为,或者通过推销不存在的商品,或者通过贬低竞争对手的商品。这意味着从真实评论中识别虚假评论比以往任何时候都更加重要。对于文本分类,标准方法采用词袋模型来表示文本,导致从神经网络学习到的稀疏性和词表示,处理未知词的能力较差。在这项工作中,我们提供了一种方法,该方法使用使用聚合方法构建的模型集合,根据使用多视图学习方法训练的三个单独模型的数据进行预测。我们的技术基于一个核心思想,即使用n-grams袋与并行卷积神经网络相结合,从评论文本(cnn)中提取有价值的信息。在训练深度和复杂cnn所需的相同计算量下,我们可以利用具有微小内核大小的n-gram嵌入层来利用局部上下文。为了更好地从文本中提取特征表示,我们基于cnn的架构将n-gram嵌入作为输入,并使用并发卷积块对其进行处理。除了包括评审文本的语言方面和与审稿人行为相关的非文本信息外,我们识别欺诈性评审的方法还考虑了审稿人的活动。我们使用公开可用的Yelp过滤数据集测试了我们的方法,并在识别欺诈性评论方面获得了高达92%的F1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fake Reviews Classification using Deep Learning
Customer decisions are heavily influenced by online reviews. Scammers and spammers can now influence consumer behavior by spreading false information in the form of reviews, either by promoting nonexistent goods or by disparaging rival goods. This means that identifying bogus from genuine reviews is more crucial than ever. For text classification, the standard method employs a bag-of-words model to represent text, leading to sparsity and word representations learned from neural networks with poor capacity for handling unknown words. In this work, we offer a method that uses an ensemble of models built using an aggregation methodology to make predictions based on data from three individual models trained using a multi-view learning approach. Our technology is based around a central idea of using bag-of-n-grams in conjunction with parallel convolution neural networks to extract valuable information from review text (CNNs). With the same amount of computing needed to train deep and sophisticated CNNs, we can leverage local context with an n-gram embedding layer that has tiny kernel sizes. In order to better extract feature representations from text, our CNN-based architecture takes n-gram embeddings as input and processes them with concurrent convolutional blocks. In addition to including linguistic aspects of the review text and non-textual information associated with reviewer behavior, our method for identifying fraudulent reviews also considers reviewer activity. We test our method using the openly available Yelp Filtered Dataset, and get F1 scores as high as 92% for recognizing fraudulent reviews.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信