在系统综述中整合大型语言模型:使用 ROBINS-I 评估偏差风险的框架和案例研究

IF 9 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Bashar Hasan, Samer Saadi, Noora S Rajjoub, Moustafa Hegazi, Mohammad Al-Kordi, Farah Fleti, Magdoleen Farah, Irbaz B Riaz, Imon Banerjee, Zhen Wang, Mohammad Hassan Murad
{"title":"在系统综述中整合大型语言模型:使用 ROBINS-I 评估偏差风险的框架和案例研究","authors":"Bashar Hasan, Samer Saadi, Noora S Rajjoub, Moustafa Hegazi, Mohammad Al-Kordi, Farah Fleti, Magdoleen Farah, Irbaz B Riaz, Imon Banerjee, Zhen Wang, Mohammad Hassan Murad","doi":"10.1136/bmjebm-2023-112597","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of ‘Classification of Intervention’. Kendall agreement coefficient was highest for the domains of ‘Participant Selection’, ‘Missing Data’ and ‘Measurement of Outcomes’, suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required. Data are available upon reasonable request. Search strategy, selection process flowchart, prompts and boxes containing included SRs and studies are available in the appendix. Analysed datasheet is available upon request.","PeriodicalId":9059,"journal":{"name":"BMJ Evidence-Based Medicine","volume":"180 1","pages":""},"PeriodicalIF":9.0000,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment\",\"authors\":\"Bashar Hasan, Samer Saadi, Noora S Rajjoub, Moustafa Hegazi, Mohammad Al-Kordi, Farah Fleti, Magdoleen Farah, Irbaz B Riaz, Imon Banerjee, Zhen Wang, Mohammad Hassan Murad\",\"doi\":\"10.1136/bmjebm-2023-112597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of ‘Classification of Intervention’. Kendall agreement coefficient was highest for the domains of ‘Participant Selection’, ‘Missing Data’ and ‘Measurement of Outcomes’, suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required. Data are available upon reasonable request. Search strategy, selection process flowchart, prompts and boxes containing included SRs and studies are available in the appendix. Analysed datasheet is available upon request.\",\"PeriodicalId\":9059,\"journal\":{\"name\":\"BMJ Evidence-Based Medicine\",\"volume\":\"180 1\",\"pages\":\"\"},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2024-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Evidence-Based Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjebm-2023-112597\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Evidence-Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjebm-2023-112597","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLMs)可促进和加快系统性综述,但将 LLMs 纳入综述过程的方法尚不明确。本研究评估了 GPT-4 与人类审稿人在使用非随机干预研究偏倚风险(ROBINS-I)工具评估偏倚风险方面的一致性,并提出了将 LLMs 纳入系统性综述的框架。案例研究表明,在 ROBINS-I 的 "干预分类 "领域,原始一致率最高。Kendall 一致性系数在 "参与者选择"、"缺失数据 "和 "结果测量 "领域最高,表明在这些领域存在中等程度的一致性。各领域总体偏倚风险的原始一致率为 61%(肯德尔系数=0.35)。将 LLM 纳入系统综述的建议框架包括四个领域:使用 LLM 的理由、协议(任务定义、模型选择、提示工程、数据录入方法、人的作用和成功指标)、执行(对协议的迭代修订)和报告。我们确定了与系统综述相关的五种基本任务类型:选择、提取、判断、分析和叙述。考虑到案例研究中与人类审稿人的一致程度,仍然需要将人工智能与独立的人类审稿人配对。如有合理要求,可提供相关数据。检索策略、筛选流程图、提示以及包含纳入的参考文献和研究的方框见附录。分析数据表可应要求提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment
Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of ‘Classification of Intervention’. Kendall agreement coefficient was highest for the domains of ‘Participant Selection’, ‘Missing Data’ and ‘Measurement of Outcomes’, suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required. Data are available upon reasonable request. Search strategy, selection process flowchart, prompts and boxes containing included SRs and studies are available in the appendix. Analysed datasheet is available upon request.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMJ Evidence-Based Medicine
BMJ Evidence-Based Medicine MEDICINE, GENERAL & INTERNAL-
CiteScore
8.90
自引率
3.40%
发文量
48
期刊介绍: BMJ Evidence-Based Medicine (BMJ EBM) publishes original evidence-based research, insights and opinions on what matters for health care. We focus on the tools, methods, and concepts that are basic and central to practising evidence-based medicine and deliver relevant, trustworthy and impactful evidence. BMJ EBM is a Plan S compliant Transformative Journal and adheres to the highest possible industry standards for editorial policies and publication ethics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信