LENS: label sparsity-tolerant adversarial learning on spatial deceptive reviews

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica Pub Date : 2024-09-14 DOI:10.1007/s10707-024-00529-5

Sirish Prabakar, Haiquan Chen, Zhe Jiang, Carl Yang, Weikuan Yu, Da Yan

{"title":"LENS: label sparsity-tolerant adversarial learning on spatial deceptive reviews","authors":"Sirish Prabakar, Haiquan Chen, Zhe Jiang, Carl Yang, Weikuan Yu, Da Yan","doi":"10.1007/s10707-024-00529-5","DOIUrl":null,"url":null,"abstract":"<p>Online businesses and websites have recently become the main target of fake reviews, where fake reviews are intentionally composed to manipulate the business ratings positively or negatively. Most of existing works to detect fake reviews are supervised methods, whose performance highly depends on the amount, quality, and variety of the labeled data, which are often non-trivial to obtain in practice. In this paper, we propose a semi-supervised label sparsity-tolerant framework, LENS, for fake review detection by mining spatial knowledge and learning distributions of embedded topics. LENS builds on two key observations. (1) Spatial knowledge revealed in spatial entities and their co-occurring latent topic distributions may indicate the review authenticity. (2) Distributions of the embedded topics (the contextual distribution) may exhibit important patterns to differentiate between real and fake reviews. Specifically, LENS first extracts embeddings for spatial named entities using a knowledge base trained from Wikipedia webpages. Second, LENS represents each input token as a distribution over the learned latent topics in the embedded topic space. To bypass the differentiation difficulty, LENS builds on two discriminators in the actor-critic architecture using reinforcement learning. Extensive experiments using the real-world spatial and non-spatial datasets show that LENS consistently outperformed the state-of-the-art semi-supervised fake review detection methods on few labels at all different labeling rates for real and fake reviews, respectively, in a label-starving setting.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"18 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoinformatica","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10707-024-00529-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Online businesses and websites have recently become the main target of fake reviews, where fake reviews are intentionally composed to manipulate the business ratings positively or negatively. Most of existing works to detect fake reviews are supervised methods, whose performance highly depends on the amount, quality, and variety of the labeled data, which are often non-trivial to obtain in practice. In this paper, we propose a semi-supervised label sparsity-tolerant framework, LENS, for fake review detection by mining spatial knowledge and learning distributions of embedded topics. LENS builds on two key observations. (1) Spatial knowledge revealed in spatial entities and their co-occurring latent topic distributions may indicate the review authenticity. (2) Distributions of the embedded topics (the contextual distribution) may exhibit important patterns to differentiate between real and fake reviews. Specifically, LENS first extracts embeddings for spatial named entities using a knowledge base trained from Wikipedia webpages. Second, LENS represents each input token as a distribution over the learned latent topics in the embedded topic space. To bypass the differentiation difficulty, LENS builds on two discriminators in the actor-critic architecture using reinforcement learning. Extensive experiments using the real-world spatial and non-spatial datasets show that LENS consistently outperformed the state-of-the-art semi-supervised fake review detection methods on few labels at all different labeling rates for real and fake reviews, respectively, in a label-starving setting.

Abstract Image

查看原文本刊更多论文

LENS：关于空间欺骗性评论的标签稀疏容忍对抗学习

在线企业和网站最近成了虚假评论的主要目标，这些虚假评论是故意编造的，目的是操纵企业的正面或负面评价。现有的大多数检测虚假评论的工作都是有监督的方法，其性能在很大程度上取决于标签数据的数量、质量和多样性，而这些数据在实践中往往难以获得。在本文中，我们提出了一个半监督标签稀疏容错框架 LENS，通过挖掘空间知识和学习嵌入主题的分布来检测虚假评论。LENS 基于两个关键观察结果。(1) 空间实体中揭示的空间知识及其共同出现的潜在话题分布可能表明评论的真实性。(2）内嵌主题的分布（上下文分布）可能会展现出区分真假评论的重要模式。具体来说，LENS 首先使用从维基百科网页中训练出来的知识库提取空间命名实体的嵌入。其次，LENS 将每个输入标记表示为嵌入式主题空间中已学潜在主题的分布。为了绕过区分的困难，LENS 利用强化学习在演员-批评架构中建立了两个判别器。使用真实世界的空间和非空间数据集进行的大量实验表明，在标签匮乏的环境中，LENS 在真实评论和虚假评论的所有不同标注率下，在少量标签上的性能始终优于最先进的半监督式虚假评论检测方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Geoinformatica 地学-计算机：信息系统

CiteScore

5.60

自引率

10.00%

发文量

审稿时长

6 months

期刊介绍： GeoInformatica is located at the confluence of two rapidly advancing domains: Computer Science and Geographic Information Science; nowadays, Earth studies use more and more sophisticated computing theory and tools, and computer processing of Earth observations through Geographic Information Systems (GIS) attracts a great deal of attention from governmental, industrial and research worlds. This journal aims to promote the most innovative results coming from the research in the field of computer science applied to geographic information systems. Thus, GeoInformatica provides an effective forum for disseminating original and fundamental research and experience in the rapidly advancing area of the use of computer science for spatial studies.