邻居是一样的吗?基于半监督概率协同学习的在线评论垃圾邮件检测模型

IF 5.1 3区管理学 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE

Information Systems Research Pub Date : 2023-10-10 DOI:10.1287/isre.2022.0047

Zhiang Wu, Guannan Liu, Junjie Wu, Yong Tan

{"title":"邻居是一样的吗?基于半监督概率协同学习的在线评论垃圾邮件检测模型","authors":"Zhiang Wu, Guannan Liu, Junjie Wu, Yong Tan","doi":"10.1287/isre.2022.0047","DOIUrl":null,"url":null,"abstract":"Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity and biased distributions, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic collaborative learning model to capture both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with a pseudo-labeling strategy and the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on two distinctive real-life data sets have demonstrated the effectiveness of our model and the value of unlabeled data learning, in which the reviewer network after proper trimming shows strong homophily effect and plays a vital role. In particular, the proposed model shows robustness against label scarcity and biased label distribution.","PeriodicalId":48411,"journal":{"name":"Information Systems Research","volume":"16 1","pages":"0"},"PeriodicalIF":5.1000,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Are Neighbors Alike? A Semi-supervised Probabilistic Collaborative Learning Model for Online Review Spammers Detection\",\"authors\":\"Zhiang Wu, Guannan Liu, Junjie Wu, Yong Tan\",\"doi\":\"10.1287/isre.2022.0047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity and biased distributions, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic collaborative learning model to capture both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with a pseudo-labeling strategy and the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on two distinctive real-life data sets have demonstrated the effectiveness of our model and the value of unlabeled data learning, in which the reviewer network after proper trimming shows strong homophily effect and plays a vital role. In particular, the proposed model shows robustness against label scarcity and biased label distribution.\",\"PeriodicalId\":48411,\"journal\":{\"name\":\"Information Systems Research\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2023-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/isre.2022.0047\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/isre.2022.0047","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

垃圾评论者有目的地对产品或商家发布不真实的评分和评论，以获取不正当利益，破坏网络平台的诚信环境。尽管已经提出了绝大多数方法来解决垃圾邮件发送者检测问题，但共谋识别、标签稀缺性和偏差分布等几个挑战仍然存在，需要进一步研究。在普遍存在的共谋垃圾邮件行为和网络同质性理论的基础上，我们引入了一个审稿人网络来解释显式的共同审稿关系，然后提出了一个半监督概率协作学习模型来捕捉审稿人的个人行为特征和审稿人网络。我们的模型的特点是将部分标签传播与伪标签策略和基于特征的学习集成到审稿人网络建模中，从理论上证明这是对网络相关合成数据集的加权逻辑回归。表征网络信息重要性的丰富参数、网络同质性的强度以及未标记数据的价值，使我们的模型更加透明。在两个不同的现实数据集上的实证评估证明了我们的模型的有效性和无标签数据学习的价值，其中经过适当修剪的审稿人网络表现出很强的同质效应，起着至关重要的作用。特别是，该模型对标签稀缺性和有偏差的标签分布具有鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Are Neighbors Alike? A Semi-supervised Probabilistic Collaborative Learning Model for Online Review Spammers Detection

Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity and biased distributions, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic collaborative learning model to capture both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with a pseudo-labeling strategy and the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on two distinctive real-life data sets have demonstrated the effectiveness of our model and the value of unlabeled data learning, in which the reviewer network after proper trimming shows strong homophily effect and plays a vital role. In particular, the proposed model shows robustness against label scarcity and biased label distribution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Systems Research Multiple-

CiteScore

9.10

自引率

8.20%

发文量

120

期刊介绍： ISR (Information Systems Research) is a journal of INFORMS, the Institute for Operations Research and the Management Sciences. Information Systems Research is a leading international journal of theory, research, and intellectual development, focused on information systems in organizations, institutions, the economy, and society.