Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

IF 4.5 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Insights into Imaging Pub Date : 2025-05-14 DOI:10.1186/s13244-025-01976-w

Yuxue Xie, Zhonghua Hu, Hongyue Tao, Yiwen Hu, Haoyu Liang, Xinmin Lu, Lei Wang, Xiangwen Li, Shuang Chen

{"title":"Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.","authors":"Yuxue Xie, Zhonghua Hu, Hongyue Tao, Yiwen Hu, Haoyu Liang, Xinmin Lu, Lei Wang, Xiangwen Li, Shuang Chen","doi":"10.1186/s13244-025-01976-w","DOIUrl":null,"url":null,"abstract":"Objectives: To evaluate the performance of large language models (LLMs) in automatically generating whole-organ MRI score (WORMS)-based structured MRI reports and predicting osteoarthritis (OA) severity for the knee.Methods: A total of 160 consecutive patients suspected of OA were included. Knee MRI reports were reviewed by three radiologists to establish the WORMS reference standard for 39 key features. GPT-4o and GPT-4o-mini were prompted using in-context knowledge (ICK) and chain-of-thought (COT) to generate WORMS-based structured reports from original reports and to automatically predict the OA severity. Four Orthopedic surgeons reviewed original and LLM-generated reports to conduct pairwise preference and difficulty tests, and their review times were recorded.Results: GPT-4o demonstrated perfect performance in extracting the laterality of the knee (accuracy = 100%). GPT-4o outperformed GPT-4o mini in generating WORMS reports (Accuracy: 93.9% vs 76.2%, respectively). GPT-4o achieved higher recall (87.3% s 46.7%, p < 0.001), while maintaining higher precision compared to GPT-4o mini (94.2% vs 71.2%, p < 0.001). For predicting OA severity, GPT-4o outperformed GPT-4o mini across all prompt strategies (best accuracy: 98.1% vs 68.7%). Surgeons found it easier to extract information and gave more preference to LLM-generated reports over the original reports (both p < 0.001) while spending less time on each report (51.27 ± 9.41 vs 87.42 ± 20.26 s, p < 0.001).Conclusion: GPT-4o generated expert multi-feature, WORMS-based reports from original free-text knee MRI reports. GPT-4o with COT achieved high accuracy in categorizing OA severity. Surgeons reported greater preference and higher efficiency when using LLM-generated reports.Critical relevance statement: The perfect performance of generating WORMS-based reports and the high efficiency and ease of use suggest that integrating LLMs into clinical workflows could greatly enhance productivity and alleviate the documentation burden faced by clinicians in knee OA.Key points: GPT-4o successfully generated WORMS-based knee MRI reports. GPT-4o with COT prompting achieved impressive accuracy in categorizing knee OA severity. Greater preference and higher efficiency were reported for LLM-generated reports.","PeriodicalId":13639,"journal":{"name":"Insights into Imaging","volume":"16 1","pages":"100"},"PeriodicalIF":4.5000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12078906/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insights into Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13244-025-01976-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To evaluate the performance of large language models (LLMs) in automatically generating whole-organ MRI score (WORMS)-based structured MRI reports and predicting osteoarthritis (OA) severity for the knee.

Methods: A total of 160 consecutive patients suspected of OA were included. Knee MRI reports were reviewed by three radiologists to establish the WORMS reference standard for 39 key features. GPT-4o and GPT-4o-mini were prompted using in-context knowledge (ICK) and chain-of-thought (COT) to generate WORMS-based structured reports from original reports and to automatically predict the OA severity. Four Orthopedic surgeons reviewed original and LLM-generated reports to conduct pairwise preference and difficulty tests, and their review times were recorded.

Results: GPT-4o demonstrated perfect performance in extracting the laterality of the knee (accuracy = 100%). GPT-4o outperformed GPT-4o mini in generating WORMS reports (Accuracy: 93.9% vs 76.2%, respectively). GPT-4o achieved higher recall (87.3% s 46.7%, p < 0.001), while maintaining higher precision compared to GPT-4o mini (94.2% vs 71.2%, p < 0.001). For predicting OA severity, GPT-4o outperformed GPT-4o mini across all prompt strategies (best accuracy: 98.1% vs 68.7%). Surgeons found it easier to extract information and gave more preference to LLM-generated reports over the original reports (both p < 0.001) while spending less time on each report (51.27 ± 9.41 vs 87.42 ± 20.26 s, p < 0.001).

Conclusion: GPT-4o generated expert multi-feature, WORMS-based reports from original free-text knee MRI reports. GPT-4o with COT achieved high accuracy in categorizing OA severity. Surgeons reported greater preference and higher efficiency when using LLM-generated reports.

Critical relevance statement: The perfect performance of generating WORMS-based reports and the high efficiency and ease of use suggest that integrating LLMs into clinical workflows could greatly enhance productivity and alleviate the documentation burden faced by clinicians in knee OA.

Key points: GPT-4o successfully generated WORMS-based knee MRI reports. GPT-4o with COT prompting achieved impressive accuracy in categorizing knee OA severity. Greater preference and higher efficiency were reported for LLM-generated reports.

查看原文本刊更多论文

基于全器官MRI评分的膝骨关节炎报告和分类的大型语言模型。

目的：评估大型语言模型（LLMs）在自动生成全器官MRI评分（WORMS）的结构化MRI报告和预测膝关节骨关节炎（OA）严重程度方面的性能。方法：共纳入160例疑似OA患者。膝关节MRI报告由三位放射科医生审阅，以建立39个关键特征的WORMS参考标准。使用上下文知识（ICK）和思维链（COT）提示gpt - 40和gpt - 40 -mini从原始报告生成基于worm的结构化报告，并自动预测OA严重程度。四名骨科医生回顾了原始报告和llm生成的报告，进行了偏好和难度成对测试，并记录了他们的回顾时间。结果：gpt - 40在提取膝关节侧度方面表现良好（准确率为100%）。gpt - 40在生成WORMS报告方面优于gpt - 40 mini（准确率分别为93.9%和76.2%）。结论：gpt - 40从原始的自由文本膝关节MRI报告中生成专家多特征、基于worms的报告。gpt - 40与COT对OA严重程度的分类具有较高的准确性。外科医生在使用llm生成的报告时报告了更高的偏好和效率。关键相关性声明：生成基于woms的报告的完美性能以及高效率和易用性表明，将llm集成到临床工作流程中可以大大提高工作效率，减轻临床医生在膝关节OA中面临的文件负担。重点：gpt - 40成功生成了基于worms的膝关节MRI报告。gpt - 40与COT提示在分类膝关节OA严重程度方面取得了令人印象深刻的准确性。llm生成的报告有更高的偏好和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Insights into Imaging Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

7.30

自引率

4.30%

发文量

182

审稿时长

13 weeks

期刊介绍： Insights into Imaging (I³) is a peer-reviewed open access journal published under the brand SpringerOpen. All content published in the journal is freely available online to anyone, anywhere! I³ continuously updates scientific knowledge and progress in best-practice standards in radiology through the publication of original articles and state-of-the-art reviews and opinions, along with recommendations and statements from the leading radiological societies in Europe. Founded by the European Society of Radiology (ESR), I³ creates a platform for educational material, guidelines and recommendations, and a forum for topics of controversy. A balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes I³ an indispensable source for current information in this field. I³ is owned by the ESR, however authors retain copyright to their article according to the Creative Commons Attribution License (see Copyright and License Agreement). All articles can be read, redistributed and reused for free, as long as the author of the original work is cited properly. The open access fees (article-processing charges) for this journal are kindly sponsored by ESR for all Members. The journal went open access in 2012, which means that all articles published since then are freely available online.