Artificial Intelligence and Machine Learning to Improve Evidence Synthesis Production Efficiency: An Observational Study of Resource Use and Time-to-Completion

Cochrane Evidence Synthesis and Methods Pub Date : 2025-05-19 DOI:10.1002/cesm.70030

Christopher James Rose, Jose Francisco Meneses-Echavez, Ashley Elizabeth Muller, Rigmor C. Berg, Tiril C. Borge, Patricia Sofia Jacobsen Jardim, Chris Cooper

{"title":"Artificial Intelligence and Machine Learning to Improve Evidence Synthesis Production Efficiency: An Observational Study of Resource Use and Time-to-Completion","authors":"Christopher James Rose, Jose Francisco Meneses-Echavez, Ashley Elizabeth Muller, Rigmor C. Berg, Tiril C. Borge, Patricia Sofia Jacobsen Jardim, Chris Cooper","doi":"10.1002/cesm.70030","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Evidence syntheses are crucial in healthcare and elsewhere but are resource-intensive, often taking years to produce. Artificial intelligence and machine learning (AI/ML) tools may improve production efficiency in certain review phases, but little is known about their impact on entire reviews.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We performed prespecified analyses of a convenience sample of eligible healthcare- or welfare-related reviews commissioned at the Norwegian Institute of Public Health between August 1 2020 (first commission to use AI/ML) and January 31 2023 (administrative cut-off). The main exposures were AI/ML use following an internal support team's recommendation versus no use. Ranking (e.g., priority screening), classification (e.g., study design), clustering (e.g., documents), and bibliometric analysis (e.g., OpenAlex) tools were included, but we did not include or exclude specific tools. Generative AI tools were not widely available during the study period. The outcomes were resources (person-hours) and time from commission to completion (approval for delivery, including peer review; weeks). Analyses accounted for nonrandomized assignment and censored outcomes (reviews ongoing at cut-off). Researchers classifying exposures were blinded to outcomes. The statistician was blinded to exposure.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Among 39 reviews, 7 (18%) were health technology assessments versus systematic reviews, 19 (49%) focused on healthcare versus welfare, 18 (46%) planned meta-analysis, and 3 (8%) were ongoing at cut-off. AI/ML tools were used in 27 (69%) reviews. Reviews that used AI/ML as recommended used more resources (mean 667 vs. 291 person-hours) but were completed slightly faster (27.6 vs. 28.2 weeks). These differences were not statistically significant (relative resource use 3.71; 95% CI: 0.36–37.95; <i>p</i> = 0.269; relative time-to-completion: 0.92; 95% CI: 0.53–1.58; <i>p</i> = 0.753).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Associations between AI/ML use and the outcomes remains uncertain. Multicenter studies or meta-analyses may be needed to determine if these tools meaningfully reduce resource use and time to produce evidence syntheses.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70030","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

Evidence syntheses are crucial in healthcare and elsewhere but are resource-intensive, often taking years to produce. Artificial intelligence and machine learning (AI/ML) tools may improve production efficiency in certain review phases, but little is known about their impact on entire reviews.

Methods

We performed prespecified analyses of a convenience sample of eligible healthcare- or welfare-related reviews commissioned at the Norwegian Institute of Public Health between August 1 2020 (first commission to use AI/ML) and January 31 2023 (administrative cut-off). The main exposures were AI/ML use following an internal support team's recommendation versus no use. Ranking (e.g., priority screening), classification (e.g., study design), clustering (e.g., documents), and bibliometric analysis (e.g., OpenAlex) tools were included, but we did not include or exclude specific tools. Generative AI tools were not widely available during the study period. The outcomes were resources (person-hours) and time from commission to completion (approval for delivery, including peer review; weeks). Analyses accounted for nonrandomized assignment and censored outcomes (reviews ongoing at cut-off). Researchers classifying exposures were blinded to outcomes. The statistician was blinded to exposure.

Results

Among 39 reviews, 7 (18%) were health technology assessments versus systematic reviews, 19 (49%) focused on healthcare versus welfare, 18 (46%) planned meta-analysis, and 3 (8%) were ongoing at cut-off. AI/ML tools were used in 27 (69%) reviews. Reviews that used AI/ML as recommended used more resources (mean 667 vs. 291 person-hours) but were completed slightly faster (27.6 vs. 28.2 weeks). These differences were not statistically significant (relative resource use 3.71; 95% CI: 0.36–37.95; p = 0.269; relative time-to-completion: 0.92; 95% CI: 0.53–1.58; p = 0.753).

Conclusions

Associations between AI/ML use and the outcomes remains uncertain. Multicenter studies or meta-analyses may be needed to determine if these tools meaningfully reduce resource use and time to produce evidence syntheses.

Abstract Image

查看原文本刊更多论文

人工智能和机器学习提高证据合成生产效率：资源使用和完成时间的观察研究

证据综合在医疗保健和其他领域至关重要，但需要大量资源，往往需要数年时间才能完成。人工智能和机器学习（AI/ML）工具可能会在某些审查阶段提高生产效率，但对整个审查的影响知之甚少。方法：我们对挪威公共卫生研究所在2020年8月1日（首次使用人工智能/机器学习）至2023年1月31日（行政截止）期间委托的符合条件的医疗保健或福利相关审查的便利样本进行了预先指定的分析。主要的风险是根据内部支持团队的建议使用AI/ML与不使用。排名（例如，优先筛选）、分类（例如，研究设计）、聚类（例如，文档）和文献计量分析（例如，OpenAlex）工具被纳入，但我们没有纳入或排除特定的工具。在研究期间，生成式人工智能工具并没有广泛使用。结果是资源（人小时）和从委托到完成的时间(批准交付，包括同行评审；周)。分析考虑了非随机分配和审查结果（截止时审查仍在进行）。对暴露程度进行分类的研究人员对结果一无所知。统计学家对曝光视而不见。结果在39篇综述中，7篇（18%）是卫生技术评估与系统评价，19篇（49%）关注医疗保健与福利，18篇（46%）计划荟萃分析，3篇（8%）截止日期仍在进行中。27篇（69%）评论使用了AI/ML工具。按照推荐使用AI/ML的评估使用了更多的资源（平均667 vs 291人小时），但完成的时间略快（27.6 vs 28.2周）。这些差异无统计学意义(相对资源利用3.71；95% ci: 0.36-37.95；p = 0.269；相对完工时间：0.92；95% ci: 0.53-1.58；p = 0.753)。结论：AI/ML使用与预后之间的关系尚不确定。可能需要多中心研究或荟萃分析来确定这些工具是否有意义地减少了资源使用和产生证据综合的时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cochrane Evidence Synthesis and Methods

自引率

0.00%

发文量