Wen Zhi Ng, Sundarimaa Erdembileg, Jean Cj Liu, Joseph D Tucker, Rayner Kay Jin Tan
{"title":"Increasing Rigour in Online Health Surveys Through the Reduction of Fraudulent Data.","authors":"Wen Zhi Ng, Sundarimaa Erdembileg, Jean Cj Liu, Joseph D Tucker, Rayner Kay Jin Tan","doi":"10.2196/68092","DOIUrl":null,"url":null,"abstract":"<p><strong>Unstructured: </strong>Online surveys have become a key tool of modern health research, offering a fast, cost-effective, and convenient means of data collection. It enables researchers to access diverse populations, such as those underrepresented in traditional studies, and facilitates the collection of stigmatized or sensitive behaviours through greater anonymity. However, the ease of participation also introduces significant challenges, particularly around data integrity and rigour. As fraudulent responses - whether from bots, repeat responders, or individuals misrepresenting themselves - become more sophisticated and pervasive, ensuring the rigour of online surveys has never been more crucial. This article provides a comprehensive synthesis of practical strategies that help to increase the rigour of online surveys through the detection and removal of fraudulent data. Drawing on recent literature and case studies, we outline several options that address the full research cycle from pre-data collection strategies to post-data collection validation. We emphasize the integration of automated screening techniques (e.g. CAPTCHAs, honeypot questions) and attention checks (e.g. trap questions) for purposeful survey design. Robust recruitment procedures (e.g. concealed eligibility criteria, two-stage screening) and a proper incentive or compensation structure can also help to deter fraudulent participation. We examine the merits and limitations of different sampling methodologies, including river sampling, online panels, and crowdsourcing platforms, offering guidance on how to select samples based on specific research objectives. Post-data collection, we discuss meta-data based techniques to detect fraudulent data (e.g. duplicate email or IP addresses, response time analysis), alongside methods to better screen for low quality responses (e.g. inconsistent response patterns, improbable qualitative responses). The escalating sophistication of fraud tactics, particularly with the growth of Artificial Intelligence, demands that researchers continuously adapt and stay vigilant. We propose the use of dynamic protocols, combining multiple strategies into a multi-pronged approach that can better filter for fraudulent data and evolve depending on the type of responses received across the data-collection process. However, there is still significant room for strategies to develop, and it should be a key focus for upcoming research. As online surveys become increasingly integral to health research, investing in robust strategies to screen for fraudulent data and increasing the rigour of studies is key to upholding scientific integrity.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/68092","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Unstructured: Online surveys have become a key tool of modern health research, offering a fast, cost-effective, and convenient means of data collection. It enables researchers to access diverse populations, such as those underrepresented in traditional studies, and facilitates the collection of stigmatized or sensitive behaviours through greater anonymity. However, the ease of participation also introduces significant challenges, particularly around data integrity and rigour. As fraudulent responses - whether from bots, repeat responders, or individuals misrepresenting themselves - become more sophisticated and pervasive, ensuring the rigour of online surveys has never been more crucial. This article provides a comprehensive synthesis of practical strategies that help to increase the rigour of online surveys through the detection and removal of fraudulent data. Drawing on recent literature and case studies, we outline several options that address the full research cycle from pre-data collection strategies to post-data collection validation. We emphasize the integration of automated screening techniques (e.g. CAPTCHAs, honeypot questions) and attention checks (e.g. trap questions) for purposeful survey design. Robust recruitment procedures (e.g. concealed eligibility criteria, two-stage screening) and a proper incentive or compensation structure can also help to deter fraudulent participation. We examine the merits and limitations of different sampling methodologies, including river sampling, online panels, and crowdsourcing platforms, offering guidance on how to select samples based on specific research objectives. Post-data collection, we discuss meta-data based techniques to detect fraudulent data (e.g. duplicate email or IP addresses, response time analysis), alongside methods to better screen for low quality responses (e.g. inconsistent response patterns, improbable qualitative responses). The escalating sophistication of fraud tactics, particularly with the growth of Artificial Intelligence, demands that researchers continuously adapt and stay vigilant. We propose the use of dynamic protocols, combining multiple strategies into a multi-pronged approach that can better filter for fraudulent data and evolve depending on the type of responses received across the data-collection process. However, there is still significant room for strategies to develop, and it should be a key focus for upcoming research. As online surveys become increasingly integral to health research, investing in robust strategies to screen for fraudulent data and increasing the rigour of studies is key to upholding scientific integrity.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.