An evaluation of the performance of stopping rules in AI-aided screening for psychological meta-analytical research

IF 6.1 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods Pub Date : 2024-10-16 DOI:10.1002/jrsm.1762

Lars König, Steffen Zitzmann, Tim Fütterer, Diego G. Campos, Ronny Scherer, Martin Hecht

{"title":"An evaluation of the performance of stopping rules in AI-aided screening for psychological meta-analytical research","authors":"Lars König, Steffen Zitzmann, Tim Fütterer, Diego G. Campos, Ronny Scherer, Martin Hecht","doi":"10.1002/jrsm.1762","DOIUrl":null,"url":null,"abstract":"<p>Several AI-aided screening tools have emerged to tackle the ever-expanding body of literature. These tools employ active learning, where algorithms sort abstracts based on human feedback. However, researchers using these tools face a crucial dilemma: When should they stop screening without knowing the proportion of relevant studies? Although numerous stopping rules have been proposed to guide users in this decision, they have yet to undergo comprehensive evaluation. In this study, we evaluated the performance of three stopping rules: the knee method, a data-driven heuristic, and a prevalence estimation technique. We measured performance via sensitivity, specificity, and screening cost and explored the influence of the prevalence of relevant studies and the choice of the learning algorithm. We curated a dataset of abstract collections from meta-analyses across five psychological research domains. Our findings revealed performance differences between stopping rules regarding all performance measures and variations in the performance of stopping rules across different prevalence ratios. Moreover, despite the relatively minor impact of the learning algorithm, we found that specific combinations of stopping rules and learning algorithms were most effective for certain prevalence ratios of relevant abstracts. Based on these results, we derived practical recommendations for users of AI-aided screening tools. Furthermore, we discuss possible implications and offer suggestions for future research.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 6","pages":"1120-1146"},"PeriodicalIF":6.1000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1762","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1762","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Several AI-aided screening tools have emerged to tackle the ever-expanding body of literature. These tools employ active learning, where algorithms sort abstracts based on human feedback. However, researchers using these tools face a crucial dilemma: When should they stop screening without knowing the proportion of relevant studies? Although numerous stopping rules have been proposed to guide users in this decision, they have yet to undergo comprehensive evaluation. In this study, we evaluated the performance of three stopping rules: the knee method, a data-driven heuristic, and a prevalence estimation technique. We measured performance via sensitivity, specificity, and screening cost and explored the influence of the prevalence of relevant studies and the choice of the learning algorithm. We curated a dataset of abstract collections from meta-analyses across five psychological research domains. Our findings revealed performance differences between stopping rules regarding all performance measures and variations in the performance of stopping rules across different prevalence ratios. Moreover, despite the relatively minor impact of the learning algorithm, we found that specific combinations of stopping rules and learning algorithms were most effective for certain prevalence ratios of relevant abstracts. Based on these results, we derived practical recommendations for users of AI-aided screening tools. Furthermore, we discuss possible implications and offer suggestions for future research.

Abstract Image

查看原文本刊更多论文

评估人工智能辅助筛选心理元分析研究中停止规则的性能。

为了应对不断扩大的文献数量，出现了几种人工智能辅助筛选工具。这些工具采用了主动学习技术，算法会根据人类的反馈对摘要进行排序。然而，使用这些工具的研究人员面临着一个重要的难题：在不知道相关研究比例的情况下，何时应该停止筛选？虽然已经提出了许多停止规则来指导用户做出这一决定，但这些规则尚未经过全面评估。在本研究中，我们评估了三种终止规则的性能：膝关节法、数据驱动启发式和流行率估计技术。我们通过灵敏度、特异性和筛选成本来衡量性能，并探讨了相关研究的流行程度和学习算法选择的影响。我们整理了来自五个心理学研究领域荟萃分析的摘要数据集。我们的研究结果表明，停止规则在所有性能指标上都存在性能差异，而且停止规则的性能在不同的流行率下也存在差异。此外，尽管学习算法的影响相对较小，但我们发现特定的停止规则和学习算法组合对于特定流行率的相关摘要最为有效。基于这些结果，我们为人工智能辅助筛选工具的用户提出了实用建议。此外，我们还讨论了可能的影响，并对未来的研究提出了建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Synthesis Methods MATHEMATICAL & COMPUTATIONAL BIOLOGYMULTID-MULTIDISCIPLINARY SCIENCES

CiteScore

16.90

自引率

3.10%

发文量

期刊介绍： Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines. Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines. By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.