Utilizing Large language models to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation.

IF 3.9 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Xiangming Cai, Yuanming Geng, Yiming Du, Bart Westerman, Duolao Wang, Chiyuan Ma, Juan J Garcia Vallejo
{"title":"Utilizing Large language models to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation.","authors":"Xiangming Cai, Yuanming Geng, Yiming Du, Bart Westerman, Duolao Wang, Chiyuan Ma, Juan J Garcia Vallejo","doi":"10.1186/s12874-025-02569-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) like ChatGPT showed great potential in aiding medical research. A heavy workload in filtering records is needed during the research process of evidence-based medicine, especially meta-analysis. However, few studies tried to use LLMs to help screen records in meta-analysis.</p><p><strong>Objective: </strong>In this research, we aimed to explore the possibility of incorporating multiple LLMs to facilitate the screening step based on the title and abstract of records during meta-analysis.</p><p><strong>Methods: </strong>Various LLMs were evaluated, which includes GPT-3.5, GPT-4, Deepseek-R1-Distill, Qwen-2.5, Phi-4, Llama-3.1, Gemma-2 and Claude-2. To assess our strategy, we selected three meta-analyses from the literature, together with a glioma meta-analysis embedded in the study, as additional validation. For the automatic selection of records from curated meta-analyses, a four-step strategy called LARS-GPT was developed, consisting of (1) criteria selection and single-prompt (prompt with one criterion) creation, (2) best combination identification, (3) combined-prompt (prompt with one or more criteria) creation, and (4) request sending and answer summary. Recall, workload reduction, precision, and F1 score were calculated to assess the performance of LARS-GPT.</p><p><strong>Results: </strong>A variable performance was found between different single-prompts, with a mean recall of 0.800. Based on these single-prompts, we were able to find combinations with better performance than the pre-set threshold. Finally, with a best combination of criteria identified, LARS-GPT showed a 40.1% workload reduction on average with a recall greater than 0.9.</p><p><strong>Conclusions: </strong>We show here the groundbreaking finding that automatic selection of literature for meta-analysis is possible with LLMs. We provide it here as a pipeline, LARS-GPT, which showed a great workload reduction while maintaining a pre-set recall.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"116"},"PeriodicalIF":3.9000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12036192/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02569-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language models (LLMs) like ChatGPT showed great potential in aiding medical research. A heavy workload in filtering records is needed during the research process of evidence-based medicine, especially meta-analysis. However, few studies tried to use LLMs to help screen records in meta-analysis.

Objective: In this research, we aimed to explore the possibility of incorporating multiple LLMs to facilitate the screening step based on the title and abstract of records during meta-analysis.

Methods: Various LLMs were evaluated, which includes GPT-3.5, GPT-4, Deepseek-R1-Distill, Qwen-2.5, Phi-4, Llama-3.1, Gemma-2 and Claude-2. To assess our strategy, we selected three meta-analyses from the literature, together with a glioma meta-analysis embedded in the study, as additional validation. For the automatic selection of records from curated meta-analyses, a four-step strategy called LARS-GPT was developed, consisting of (1) criteria selection and single-prompt (prompt with one criterion) creation, (2) best combination identification, (3) combined-prompt (prompt with one or more criteria) creation, and (4) request sending and answer summary. Recall, workload reduction, precision, and F1 score were calculated to assess the performance of LARS-GPT.

Results: A variable performance was found between different single-prompts, with a mean recall of 0.800. Based on these single-prompts, we were able to find combinations with better performance than the pre-set threshold. Finally, with a best combination of criteria identified, LARS-GPT showed a 40.1% workload reduction on average with a recall greater than 0.9.

Conclusions: We show here the groundbreaking finding that automatic selection of literature for meta-analysis is possible with LLMs. We provide it here as a pipeline, LARS-GPT, which showed a great workload reduction while maintaining a pre-set recall.

使用大型语言模型来选择文献进行meta分析,可以减少工作量,同时保持与手动管理相似的召回水平。
背景:像ChatGPT这样的大型语言模型(llm)在辅助医学研究方面显示出巨大的潜力。在循证医学的研究过程中,尤其是荟萃分析,需要大量的记录过滤工作。然而,很少有研究尝试使用llm来帮助筛选meta分析中的记录。目的:在本研究中,我们旨在探讨合并多个llm的可能性,以促进meta分析中基于记录标题和摘要的筛选步骤。方法:对GPT-3.5、GPT-4、Deepseek-R1-Distill、Qwen-2.5、Phi-4、Llama-3.1、Gemma-2、claud -2等llm进行评价。为了评估我们的策略,我们从文献中选择了三个荟萃分析,以及研究中嵌入的胶质瘤荟萃分析,作为额外的验证。为了从精心策划的荟萃分析中自动选择记录,开发了一种称为LARS-GPT的四步策略,包括(1)标准选择和单提示(具有一个标准的提示)创建,(2)最佳组合识别,(3)组合提示(具有一个或多个标准的提示)创建,以及(4)请求发送和回答摘要。计算召回率、工作量减少、精确度和F1评分来评估LARS-GPT的性能。结果:在不同的单一提示之间发现了不同的表现,平均召回率为0.800。基于这些单一提示,我们能够找到比预先设置的阈值性能更好的组合。最后,通过确定标准的最佳组合,LARS-GPT平均减少了40.1%的工作量,召回率大于0.9。结论:我们在这里展示了突破性的发现,自动选择文献进行荟萃分析是可能的法学硕士。我们在这里提供了一个管道,LARS-GPT,在保持预先设定的召回的同时,大大减少了工作量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信