Open science perspectives on machine learning for the identification of careless responding: A new hope or phantom menace?

IF 4.8 2区心理学 Q1 PSYCHOLOGY, SOCIAL

Social and Personality Psychology Compass Pub Date : 2024-02-18 DOI:10.1111/spc3.12941

Andreas Alfons, Max Welz

{"title":"Open science perspectives on machine learning for the identification of careless responding: A new hope or phantom menace?","authors":"Andreas Alfons, Max Welz","doi":"10.1111/spc3.12941","DOIUrl":null,"url":null,"abstract":"Powerful methods for identifying careless respondents in survey data are not just important to ensure the validity of subsequent data analyses, they are also instrumental for studying the psychological processes that drive humans to respond carelessly. Conversely, a deeper understanding of the phenomenon of careless responding enables the development of improved methods for the identification of careless respondents. While machine learning has gained substantial attention and popularity in many scientific fields, it is largely unexplored for the detection of careless responding. On the one hand, machine learning algorithms can be highly powerful tools due to their flexibility. On the other hand, science based on machine learning has been criticized in the literature for a lack of reproducibility. We assess the potential and the pitfalls of machine learning approaches for identifying careless respondents from an open science perspective. In particular, we discuss possible sources of reproducibility issues when applying machine learning in the context of careless responding, and we give practical guidelines on how to avoid them. Furthermore, we illustrate the high potential of an unsupervised machine learning method for the identification of careless respondents in a proof-of-concept simulation experiment. Finally, we stress the necessity of building an open data repository with labeled benchmark data sets, which would enable the evaluation of methods in a more realistic setting and make it possible to train supervised learning methods. Without such a data repository, the true potential of machine learning for the identification of careless responding may fail to be unlocked.","PeriodicalId":53583,"journal":{"name":"Social and Personality Psychology Compass","volume":"10 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social and Personality Psychology Compass","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/spc3.12941","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, SOCIAL","Score":null,"Total":0}

引用次数: 0

Abstract

Powerful methods for identifying careless respondents in survey data are not just important to ensure the validity of subsequent data analyses, they are also instrumental for studying the psychological processes that drive humans to respond carelessly. Conversely, a deeper understanding of the phenomenon of careless responding enables the development of improved methods for the identification of careless respondents. While machine learning has gained substantial attention and popularity in many scientific fields, it is largely unexplored for the detection of careless responding. On the one hand, machine learning algorithms can be highly powerful tools due to their flexibility. On the other hand, science based on machine learning has been criticized in the literature for a lack of reproducibility. We assess the potential and the pitfalls of machine learning approaches for identifying careless respondents from an open science perspective. In particular, we discuss possible sources of reproducibility issues when applying machine learning in the context of careless responding, and we give practical guidelines on how to avoid them. Furthermore, we illustrate the high potential of an unsupervised machine learning method for the identification of careless respondents in a proof-of-concept simulation experiment. Finally, we stress the necessity of building an open data repository with labeled benchmark data sets, which would enable the evaluation of methods in a more realistic setting and make it possible to train supervised learning methods. Without such a data repository, the true potential of machine learning for the identification of careless responding may fail to be unlocked.

查看原文本刊更多论文

机器学习识别粗心应答的开放科学视角：新希望还是幽灵威胁？

在调查数据中识别粗心受访者的有效方法不仅对确保后续数据分析的有效性非常重要，而且对研究促使人类粗心应答的心理过程也很有帮助。反过来，加深对粗心应答现象的理解也有助于开发出更好的方法来识别粗心的受访者。虽然机器学习在许多科学领域都获得了极大的关注和普及，但它在检测粗心应答方面却基本上没有被探索过。一方面，机器学习算法因其灵活性而成为非常强大的工具。另一方面，基于机器学习的科学因缺乏可重复性而在文献中饱受批评。我们从开放科学的角度评估了机器学习方法在识别粗心应答者方面的潜力和缺陷。特别是，我们讨论了在粗心应答的背景下应用机器学习时可能出现的可重复性问题，并给出了如何避免这些问题的实用指南。此外，我们还在概念验证模拟实验中说明了无监督机器学习方法在识别粗心应答者方面的巨大潜力。最后，我们强调有必要建立一个带有标注基准数据集的开放式数据存储库，这样就能在更真实的环境中对方法进行评估，并有可能对监督学习方法进行训练。如果没有这样一个数据存储库，机器学习在识别粗心应答方面的真正潜力可能无法释放。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Social and Personality Psychology Compass Psychology-Social Psychology

CiteScore

5.20

自引率

2.20%

发文量