Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software

Cochrane Evidence Synthesis and Methods Pub Date : 2025-10-25 DOI:10.1002/cesm.70059

Kevin M. Kallmes, Jade Thurnham, Marius Sauca, Ranita Tarchand, Keith R. Kallmes, Karl J. Holub

{"title":"Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software","authors":"Kevin M. Kallmes, Jade Thurnham, Marius Sauca, Ranita Tarchand, Keith R. Kallmes, Karl J. Holub","doi":"10.1002/cesm.70059","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, the need for expert oversight has been recognized to ensure the quality of SLR findings. Here, we describe a complete methodology for utilizing our AI SLR tool with human-in-the-loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>SLRs require completing Search, Screening, and Extraction from relevant studies, with meta-analysis and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit software (Nested Knowledge). This system integrates AI models into the central steps in SLR: Search strategy generation, Dual Screening of Titles/Abstracts and Full Texts, and Extraction of qualitative and quantitative evidence. The system also offers manual Critical Appraisal and Insight drafting and fully-automated Network Meta-analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings and ‘rapid review’ alternatives to the SLR workflow.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Search strategy generation with the Smart Search AI can turn a Research Question into full Boolean strings with 76.8% and 79.6% Recall in two validation sets. Supervised machine learning tools can achieve 82–97% Recall in reviewer-level Screening. Population, Interventions/Comparators, and Outcomes (PICOs) extraction achieved F1 of 0.74; accuracy for study type, location, and size were 74%, 78%, and 91%, respectively. Time savings of 50% in Abstract Screening and 70–80% in qualitative extraction were reported. Extraction of user-specified qualitative and quantitative tags and data elements remains exploratory and requires human curation for SLRs.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>AI systems can support high-quality, human-in-the-loop execution of key SLR stages. Transparency, replicability, and expert oversight are central to the use of AI SLR tools.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70059","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, the need for expert oversight has been recognized to ensure the quality of SLR findings. Here, we describe a complete methodology for utilizing our AI SLR tool with human-in-the-loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.

Methods

SLRs require completing Search, Screening, and Extraction from relevant studies, with meta-analysis and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit software (Nested Knowledge). This system integrates AI models into the central steps in SLR: Search strategy generation, Dual Screening of Titles/Abstracts and Full Texts, and Extraction of qualitative and quantitative evidence. The system also offers manual Critical Appraisal and Insight drafting and fully-automated Network Meta-analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings and ‘rapid review’ alternatives to the SLR workflow.

Results

Search strategy generation with the Smart Search AI can turn a Research Question into full Boolean strings with 76.8% and 79.6% Recall in two validation sets. Supervised machine learning tools can achieve 82–97% Recall in reviewer-level Screening. Population, Interventions/Comparators, and Outcomes (PICOs) extraction achieved F1 of 0.74; accuracy for study type, location, and size were 74%, 78%, and 91%, respectively. Time savings of 50% in Abstract Screening and 70–80% in qualitative extraction were reported. Extraction of user-specified qualitative and quantitative tags and data elements remains exploratory and requires human curation for SLRs.

Conclusion

AI systems can support high-quality, human-in-the-loop execution of key SLR stages. Transparency, replicability, and expert oversight are central to the use of AI SLR tools.

Abstract Image

查看原文本刊更多论文

用于系统文献评论的人在环人工智能系统：AutoLit评论软件的方法和验证

虽然人工智能（AI）工具已经被用于系统文献综述（SLR）过程中的各个阶段，但以前还没有工具被证明可以支持每个关键的单反步骤。此外，已认识到需要专家监督，以确保单反调查结果的质量。在这里，我们描述了一个完整的方法，用于利用我们的人工智能单反工具和人工在环管理工作流程，以及人工智能验证、节省时间和确保符合最佳审查实践的方法。方法单反研究需要完成相关研究的检索、筛选和提取，并进行meta分析和批判性评价。我们提出了一个完整的方法框架，利用我们的AutoLit软件（嵌套知识）完成单反。该系统将人工智能模型集成到单反的核心步骤中：搜索策略生成，标题/摘要和全文的双重筛选，以及定性和定量证据的提取。该系统还提供手动关键评估和洞察力起草和全自动网络元分析。报告了将人工智能性能与专家进行比较的验证，并在相关的情况下，节省时间和“快速审查”替代单反工作流程。使用Smart Search AI生成搜索策略可以将研究问题转换为完整的布尔字符串，在两个验证集中召回率分别为76.8%和79.6%。有监督的机器学习工具在审查员级别的筛选中可以达到82-97%的召回率。人群、干预/比较物和结果（PICOs）提取的F1为0.74；研究类型、地点和规模的准确性分别为74%、78%和91%。摘要筛选节省时间50%，定性提取节省时间70-80%。用户指定的定性和定量标签和数据元素的提取仍然是探索性的，需要人为的单反管理。结论：人工智能系统可以支持高质量的、人工在环的单反关键阶段的执行。透明度、可复制性和专家监督是使用人工智能单反工具的核心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cochrane Evidence Synthesis and Methods

自引率

0.00%

发文量