Unsupervised [randomly responding] survey bot detection: In search of high classification accuracy.

IF 7.8 1区心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY

Psychological methods Pub Date : 2025-03-10 DOI:10.1037/met0000746

Carl F Falk, Amaris Huang, Michael John Ilagan

{"title":"Unsupervised [randomly responding] survey bot detection: In search of high classification accuracy.","authors":"Carl F Falk, Amaris Huang, Michael John Ilagan","doi":"10.1037/met0000746","DOIUrl":null,"url":null,"abstract":"<p><p>While online survey data collection has become popular in the social sciences, there is a risk of data contamination by computer-generated random responses (i.e., bots). Bot prevalence poses a significant threat to data quality. If deterrence efforts fail or were not set up in advance, researchers can still attempt to detect bots already present in the data. In this research, we study a recently developed algorithm to detect survey bots. The algorithm requires neither a measurement model nor a sample of known humans and bots; thus, it is model agnostic and unsupervised. It involves a permutation test under the assumption that Likert-type items are exchangeable for bots, but not humans. While the algorithm maintains a desired sensitivity for detecting bots (e.g., 95%), its classification accuracy may depend on other inventory-specific or demographic factors. Generating hypothetical human responses from a well-known item response theory model, we use simulations to understand how classification accuracy is affected by item properties, the number of items, the number of latent factors, and factor correlations. In an additional study, we simulate bots to contaminate real human data from 35 publicly available data sets to understand the algorithm's classification accuracy under a variety of real measurement instruments. Through this work, we identify conditions under which classification accuracy is around 95% or above, but also conditions under which accuracy is quite low. In brief, performance is better with more items, more categories per item, and a variety in the difficulty or means of the survey items. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000746","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

While online survey data collection has become popular in the social sciences, there is a risk of data contamination by computer-generated random responses (i.e., bots). Bot prevalence poses a significant threat to data quality. If deterrence efforts fail or were not set up in advance, researchers can still attempt to detect bots already present in the data. In this research, we study a recently developed algorithm to detect survey bots. The algorithm requires neither a measurement model nor a sample of known humans and bots; thus, it is model agnostic and unsupervised. It involves a permutation test under the assumption that Likert-type items are exchangeable for bots, but not humans. While the algorithm maintains a desired sensitivity for detecting bots (e.g., 95%), its classification accuracy may depend on other inventory-specific or demographic factors. Generating hypothetical human responses from a well-known item response theory model, we use simulations to understand how classification accuracy is affected by item properties, the number of items, the number of latent factors, and factor correlations. In an additional study, we simulate bots to contaminate real human data from 35 publicly available data sets to understand the algorithm's classification accuracy under a variety of real measurement instruments. Through this work, we identify conditions under which classification accuracy is around 95% or above, but also conditions under which accuracy is quite low. In brief, performance is better with more items, more categories per item, and a variety in the difficulty or means of the survey items. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

查看原文本刊更多论文

无监督[随机响应]调查机器人检测：寻求高分类精度。

虽然在线调查数据收集在社会科学领域已经变得很流行，但计算机生成的随机响应（即机器人）存在数据污染的风险。Bot的流行对数据质量构成了重大威胁。如果威慑措施失败或没有提前设置，研究人员仍然可以尝试检测数据中已经存在的机器人。在本研究中，我们研究了最近开发的一种检测调查机器人的算法。该算法既不需要测量模型，也不需要已知人类和机器人的样本；因此，它是模型不可知论和无监督的。它包含了一个排列测试，假设likert类型的道具可以与bot交换，但不能与人类交换。虽然该算法在检测机器人方面保持了理想的灵敏度（例如95%），但其分类准确性可能取决于其他特定于库存或人口统计因素。从一个著名的项目反应理论模型中生成假设的人类反应，我们使用模拟来了解分类准确性如何受到项目属性、项目数量、潜在因素数量和因素相关性的影响。在另一项研究中，我们模拟机器人污染来自35个公开可用数据集的真实人类数据，以了解算法在各种真实测量仪器下的分类准确性。通过这项工作，我们确定了分类准确率在95%左右或以上的情况，以及准确率相当低的情况。简而言之，项目越多，每个项目的类别越多，调查项目的难度或手段也越多样，表现就越好。（PsycInfo Database Record (c) 2025 APA，版权所有）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-

CiteScore

13.10

自引率

7.10%

发文量

159

期刊介绍： Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.