{"title":"邻居是一样的吗?基于半监督概率协同学习的在线评论垃圾邮件检测模型","authors":"Zhiang Wu, Guannan Liu, Junjie Wu, Yong Tan","doi":"10.1287/isre.2022.0047","DOIUrl":null,"url":null,"abstract":"Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity and biased distributions, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic collaborative learning model to capture both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with a pseudo-labeling strategy and the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on two distinctive real-life data sets have demonstrated the effectiveness of our model and the value of unlabeled data learning, in which the reviewer network after proper trimming shows strong homophily effect and plays a vital role. In particular, the proposed model shows robustness against label scarcity and biased label distribution.","PeriodicalId":48411,"journal":{"name":"Information Systems Research","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Are Neighbors Alike? A Semi-supervised Probabilistic Collaborative Learning Model for Online Review Spammers Detection\",\"authors\":\"Zhiang Wu, Guannan Liu, Junjie Wu, Yong Tan\",\"doi\":\"10.1287/isre.2022.0047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity and biased distributions, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic collaborative learning model to capture both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with a pseudo-labeling strategy and the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on two distinctive real-life data sets have demonstrated the effectiveness of our model and the value of unlabeled data learning, in which the reviewer network after proper trimming shows strong homophily effect and plays a vital role. In particular, the proposed model shows robustness against label scarcity and biased label distribution.\",\"PeriodicalId\":48411,\"journal\":{\"name\":\"Information Systems Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2023-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/isre.2022.0047\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/isre.2022.0047","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
Are Neighbors Alike? A Semi-supervised Probabilistic Collaborative Learning Model for Online Review Spammers Detection
Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity and biased distributions, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic collaborative learning model to capture both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with a pseudo-labeling strategy and the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on two distinctive real-life data sets have demonstrated the effectiveness of our model and the value of unlabeled data learning, in which the reviewer network after proper trimming shows strong homophily effect and plays a vital role. In particular, the proposed model shows robustness against label scarcity and biased label distribution.
期刊介绍:
ISR (Information Systems Research) is a journal of INFORMS, the Institute for Operations Research and the Management Sciences. Information Systems Research is a leading international journal of theory, research, and intellectual development, focused on information systems in organizations, institutions, the economy, and society.