Artificial Intelligence and Machine Learning to Improve Evidence Synthesis Production Efficiency: An Observational Study of Resource Use and Time-to-Completion
Christopher James Rose, Jose Francisco Meneses-Echavez, Ashley Elizabeth Muller, Rigmor C. Berg, Tiril C. Borge, Patricia Sofia Jacobsen Jardim, Chris Cooper
{"title":"Artificial Intelligence and Machine Learning to Improve Evidence Synthesis Production Efficiency: An Observational Study of Resource Use and Time-to-Completion","authors":"Christopher James Rose, Jose Francisco Meneses-Echavez, Ashley Elizabeth Muller, Rigmor C. Berg, Tiril C. Borge, Patricia Sofia Jacobsen Jardim, Chris Cooper","doi":"10.1002/cesm.70030","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Evidence syntheses are crucial in healthcare and elsewhere but are resource-intensive, often taking years to produce. Artificial intelligence and machine learning (AI/ML) tools may improve production efficiency in certain review phases, but little is known about their impact on entire reviews.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We performed prespecified analyses of a convenience sample of eligible healthcare- or welfare-related reviews commissioned at the Norwegian Institute of Public Health between August 1 2020 (first commission to use AI/ML) and January 31 2023 (administrative cut-off). The main exposures were AI/ML use following an internal support team's recommendation versus no use. Ranking (e.g., priority screening), classification (e.g., study design), clustering (e.g., documents), and bibliometric analysis (e.g., OpenAlex) tools were included, but we did not include or exclude specific tools. Generative AI tools were not widely available during the study period. The outcomes were resources (person-hours) and time from commission to completion (approval for delivery, including peer review; weeks). Analyses accounted for nonrandomized assignment and censored outcomes (reviews ongoing at cut-off). Researchers classifying exposures were blinded to outcomes. The statistician was blinded to exposure.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Among 39 reviews, 7 (18%) were health technology assessments versus systematic reviews, 19 (49%) focused on healthcare versus welfare, 18 (46%) planned meta-analysis, and 3 (8%) were ongoing at cut-off. AI/ML tools were used in 27 (69%) reviews. Reviews that used AI/ML as recommended used more resources (mean 667 vs. 291 person-hours) but were completed slightly faster (27.6 vs. 28.2 weeks). These differences were not statistically significant (relative resource use 3.71; 95% CI: 0.36–37.95; <i>p</i> = 0.269; relative time-to-completion: 0.92; 95% CI: 0.53–1.58; <i>p</i> = 0.753).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Associations between AI/ML use and the outcomes remains uncertain. Multicenter studies or meta-analyses may be needed to determine if these tools meaningfully reduce resource use and time to produce evidence syntheses.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"3 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.70030","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.70030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Evidence syntheses are crucial in healthcare and elsewhere but are resource-intensive, often taking years to produce. Artificial intelligence and machine learning (AI/ML) tools may improve production efficiency in certain review phases, but little is known about their impact on entire reviews.
Methods
We performed prespecified analyses of a convenience sample of eligible healthcare- or welfare-related reviews commissioned at the Norwegian Institute of Public Health between August 1 2020 (first commission to use AI/ML) and January 31 2023 (administrative cut-off). The main exposures were AI/ML use following an internal support team's recommendation versus no use. Ranking (e.g., priority screening), classification (e.g., study design), clustering (e.g., documents), and bibliometric analysis (e.g., OpenAlex) tools were included, but we did not include or exclude specific tools. Generative AI tools were not widely available during the study period. The outcomes were resources (person-hours) and time from commission to completion (approval for delivery, including peer review; weeks). Analyses accounted for nonrandomized assignment and censored outcomes (reviews ongoing at cut-off). Researchers classifying exposures were blinded to outcomes. The statistician was blinded to exposure.
Results
Among 39 reviews, 7 (18%) were health technology assessments versus systematic reviews, 19 (49%) focused on healthcare versus welfare, 18 (46%) planned meta-analysis, and 3 (8%) were ongoing at cut-off. AI/ML tools were used in 27 (69%) reviews. Reviews that used AI/ML as recommended used more resources (mean 667 vs. 291 person-hours) but were completed slightly faster (27.6 vs. 28.2 weeks). These differences were not statistically significant (relative resource use 3.71; 95% CI: 0.36–37.95; p = 0.269; relative time-to-completion: 0.92; 95% CI: 0.53–1.58; p = 0.753).
Conclusions
Associations between AI/ML use and the outcomes remains uncertain. Multicenter studies or meta-analyses may be needed to determine if these tools meaningfully reduce resource use and time to produce evidence syntheses.