Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist.

IF 4.4 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Korean Journal of Radiology Pub Date : 2025-04-01 Epub Date: 2025-01-23 DOI:10.3348/kjr.2024.1161

Ji Su Ko, Hwon Heo, Chong Hyun Suh, Jeho Yi, Woo Hyun Shim

{"title":"Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist.","authors":"Ji Su Ko, Hwon Heo, Chong Hyun Suh, Jeho Yi, Woo Hyun Shim","doi":"10.3348/kjr.2024.1161","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.Materials and methods: A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data-were independently extracted by two reviewers, and adherence was calculated for each item.Results: Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.Conclusion: Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"304-312"},"PeriodicalIF":4.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955383/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3348/kjr.2024.1161","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/23 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.

Materials and methods: A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data-were independently extracted by two reviewers, and adherence was calculated for each item.

Results: Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.

Conclusion: Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.

查看原文本刊更多论文

根据MI-CLEAR-LLM核对表，在主要医学期刊上发表的医学应用大型语言模型研究的依从性。

目的评估基于大型语言模型（LLM）的医疗保健研究是否遵守《医疗保健大型语言模型准确性清晰评估最低报告项目》（MI-CLEAR-LLM）检查表，该框架旨在提高有关医疗应用 LLM 准确性研究的透明度和可重复性：我们在PubMed上进行了系统性检索，以确定2022年11月30日至2024年6月25日期间发表在高级临床医学期刊（根据2023年期刊影响因子，59个专业中每个专业排名前10%）上的有关LLM性能的文章。关于 MI-CLEAR-LLM 检查单六个项目的数据：1) 所用 LLM 的识别和规范；2) 随机性处理；3) 提示措辞和语法；4) 提示结构；5) 提示测试和优化；6) 测试数据的独立性：在 159 项研究中，100%（159/159）报告了 LLM 的名称，96.9%（154/159）报告了版本，91.8%（146/159）报告了制造商。但是，只有 54.1%（86/159）的研究报告了训练数据的截止日期，6.3%（10/159）的研究报告记录了访问网络信息的情况，50.9%（81/159）的研究报告提供了查询尝试的日期。15.1%（24/159）的研究提供了有关随机性管理的明确记录。在提示细节方面，49.1%（78/159）的研究提供了准确的提示措辞和语法，但只有 34.0%（54/159）的研究记录了提示结构实践。虽然有 46.5%（74/159）的研究详细介绍了提示测试，但只有 15.7%（25/159）的研究解释了具体选词的理由。只有 13.2%（21/159）的研究报告了测试数据的独立性，56.6%（43/76）的研究报告提供了来自互联网的测试数据的 URL：结论：虽然LLM的基本识别细节报告得相对较好，但包括随机性、提示和测试数据在内的其他关键方面却经常报告不足。加强对MI-CLEAR-LLM清单的遵守将使LLM研究实现更大的透明度，并将促进未来研究的可信度和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Korean Journal of Radiology 医学-核医学

CiteScore

10.60

自引率

12.50%

发文量

141

审稿时长

1.3 months

期刊介绍： The inaugural issue of the Korean J Radiol came out in March 2000. Our journal aims to produce and propagate knowledge on radiologic imaging and related sciences. A unique feature of the articles published in the Journal will be their reflection of global trends in radiology combined with an East-Asian perspective. Geographic differences in disease prevalence will be reflected in the contents of papers, and this will serve to enrich our body of knowledge. World''s outstanding radiologists from many countries are serving as editorial board of our journal.