Development and Performance of a Large Language Model for the Quality Evaluation of Multi-Language Medical Imaging Guidelines and Consensus

IF 3.6 2区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Zhixiang Wang, Jing Sun, Hui Liu, Xufei Luo, Jia Li, Wenjun He, Zhenhua Yang, Han Lv, Yaolong Chen, Zhenchang Wang
{"title":"Development and Performance of a Large Language Model for the Quality Evaluation of Multi-Language Medical Imaging Guidelines and Consensus","authors":"Zhixiang Wang,&nbsp;Jing Sun,&nbsp;Hui Liu,&nbsp;Xufei Luo,&nbsp;Jia Li,&nbsp;Wenjun He,&nbsp;Zhenhua Yang,&nbsp;Han Lv,&nbsp;Yaolong Chen,&nbsp;Zhenchang Wang","doi":"10.1111/jebm.70020","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>This study aimed to develop and evaluate an automated large language model (LLM)-based system for assessing the quality of medical imaging guidelines and consensus (GACS) in different languages, focusing on enhancing evaluation efficiency, consistency, and reducing manual workload.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>We developed the QPC-HASE-GuidelineEval algorithm, which integrates a Four-Quadrant Questions Classification Strategy and Hybrid Search Enhancement. The model was validated on 45 medical imaging guidelines (36 in Chinese and 9 in English) published in 2021 and 2022. Key evaluation metrics included consistency with expert assessments, hybrid search paragraph matching accuracy, information completeness, comparisons of different paragraph matching approaches, and cost-time efficiency.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The algorithm demonstrated an average accuracy of 77%, excelling in simpler tasks but showing lower accuracy (29%–40%) in complex evaluations, such as explanations and visual aids. The average accuracy rates of the English and Chinese versions of the GACS were 74% and 76%, respectively (<i>p</i> = 0.37). Hybrid search demonstrated superior performance with paragraph matching accuracy (4.42) and information completeness (4.42), significantly outperforming keyword-based search (1.05/1.05) and sparse-dense retrieval (4.26/3.63). The algorithm significantly reduced evaluation time to 8 min and 30 s per guideline and reduced costs to approximately 0.5 USD per guideline, offering a considerable advantage over traditional manual methods.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The QPC-HASE-GuidelineEval algorithm, powered by LLMs, showed strong potential for improving the efficiency, scalability, and multi-language capability of guideline evaluations, though further enhancements are needed to handle more complex tasks that require deeper interpretation.</p>\n </section>\n </div>","PeriodicalId":16090,"journal":{"name":"Journal of Evidence‐Based Medicine","volume":"18 2","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Evidence‐Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jebm.70020","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Aim

This study aimed to develop and evaluate an automated large language model (LLM)-based system for assessing the quality of medical imaging guidelines and consensus (GACS) in different languages, focusing on enhancing evaluation efficiency, consistency, and reducing manual workload.

Method

We developed the QPC-HASE-GuidelineEval algorithm, which integrates a Four-Quadrant Questions Classification Strategy and Hybrid Search Enhancement. The model was validated on 45 medical imaging guidelines (36 in Chinese and 9 in English) published in 2021 and 2022. Key evaluation metrics included consistency with expert assessments, hybrid search paragraph matching accuracy, information completeness, comparisons of different paragraph matching approaches, and cost-time efficiency.

Results

The algorithm demonstrated an average accuracy of 77%, excelling in simpler tasks but showing lower accuracy (29%–40%) in complex evaluations, such as explanations and visual aids. The average accuracy rates of the English and Chinese versions of the GACS were 74% and 76%, respectively (p = 0.37). Hybrid search demonstrated superior performance with paragraph matching accuracy (4.42) and information completeness (4.42), significantly outperforming keyword-based search (1.05/1.05) and sparse-dense retrieval (4.26/3.63). The algorithm significantly reduced evaluation time to 8 min and 30 s per guideline and reduced costs to approximately 0.5 USD per guideline, offering a considerable advantage over traditional manual methods.

Conclusion

The QPC-HASE-GuidelineEval algorithm, powered by LLMs, showed strong potential for improving the efficiency, scalability, and multi-language capability of guideline evaluations, though further enhancements are needed to handle more complex tasks that require deeper interpretation.

多语言医学影像指南和共识质量评估的大型语言模型的开发和性能
本研究旨在开发和评估一个基于自动化大语言模型(LLM)的系统,用于评估不同语言的医学影像指南和共识(GACS)的质量,重点是提高评估效率、一致性和减少人工工作量。方法开发了qpc - hase - guideineeval算法,该算法集成了四象限问题分类策略和混合搜索增强。该模型在2021年和2022年出版的45份医学影像指南(中文36份,英文9份)上进行了验证。关键评估指标包括与专家评估的一致性、混合搜索段落匹配的准确性、信息完整性、不同段落匹配方法的比较以及成本-时间效率。结果该算法的平均准确率为77%,在较简单的任务中表现出色,但在复杂的评估(如解释和视觉辅助)中准确率较低(29%-40%)。中英文版GACS的平均准确率分别为74%和76% (p = 0.37)。混合搜索在段落匹配精度(4.42)和信息完备性(4.42)方面表现优异,显著优于基于关键字的搜索(1.05/1.05)和稀疏密集检索(4.26/3.63)。该算法显著地将评估时间缩短到每条指南8分钟30秒,将成本降低到每条指南约0.5美元,与传统的人工方法相比具有相当大的优势。结论基于llm的qpc - hase - guideineeval算法在提高指南评估的效率、可扩展性和多语言能力方面显示出强大的潜力,但需要进一步增强以处理需要更深入解释的更复杂任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Evidence‐Based Medicine
Journal of Evidence‐Based Medicine MEDICINE, GENERAL & INTERNAL-
CiteScore
11.20
自引率
1.40%
发文量
42
期刊介绍: The Journal of Evidence-Based Medicine (EMB) is an esteemed international healthcare and medical decision-making journal, dedicated to publishing groundbreaking research outcomes in evidence-based decision-making, research, practice, and education. Serving as the official English-language journal of the Cochrane China Centre and West China Hospital of Sichuan University, we eagerly welcome editorials, commentaries, and systematic reviews encompassing various topics such as clinical trials, policy, drug and patient safety, education, and knowledge translation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信