Decoding Radiology Reports: Artificial Intelligence-Large Language Models Can Improve the Readability of Hand and Wrist Orthopedic Radiology Reports.

IF 1.8 Q2 ORTHOPEDICS

HAND Pub Date : 2025-10-01 Epub Date: 2024-08-13 DOI:10.1177/15589447241267766

James J Butler, Ernesto Acosta, Michael C Kuna, Michael C Harrington, Andrew J Rosenbaum, Michael T Mulligan, John G Kennedy

{"title":"Decoding Radiology Reports: Artificial Intelligence-Large Language Models Can Improve the Readability of Hand and Wrist Orthopedic Radiology Reports.","authors":"James J Butler, Ernesto Acosta, Michael C Kuna, Michael C Harrington, Andrew J Rosenbaum, Michael T Mulligan, John G Kennedy","doi":"10.1177/15589447241267766","DOIUrl":null,"url":null,"abstract":"Background: The purpose of this study was to assess the effectiveness of an Artificial Intelligence-Large Language Model (AI-LLM) at improving the readability of hand and wrist radiology reports.Methods: The radiology reports of 100 hand and/or wrist radiographs, 100 hand and/or wrist computed tomography (CT) scans, and 100 hand and/or wrist magnetic resonance imaging (MRI) scans were extracted. The following prompt command was inserted into the AI-LLM: \"Explain this radiology report to a patient in layman's terms in the second person: [Report Text].\" The report length, Flesch reading ease score (FRES), and Flesch-Kincaid reading level (FKRL) were calculated for the original radiology report and the AI-LLM-generated report. The accuracy of the AI-LLM report was assessed via a 5-point Likert scale. Any \"hallucination\" produced by the AI-LLM-generated report was recorded.Results: There was a statistically significant improvement in mean FRES scores and FKRL scores in the AI-LLM-generated radiograph report, CT report, and MRI report. For all AI-LLM-generated reports, the mean reading level improved to below an eighth-grade reading level. The mean Likert score for the AI-LLM-generated radiograph report, CT report, and MRI report was 4.1 ± 0.6, 3.9 ± 0.6, and 3.9 ± 0.7, respectively. The hallucination rate in the AI-LLM-generated radiograph report, CT report, and MRI report was 3%, 6%, and 6%, respectively.Conclusions: This study demonstrates that AI-LLM effectively improves the readability of hand and wrist radiology reports, underscoring the potential application of AI-LLM as a promising and innovative patient-centric strategy to improve patient comprehension of their imaging reports.Level of Evidence: IV.","PeriodicalId":12902,"journal":{"name":"HAND","volume":" ","pages":"1144-1152"},"PeriodicalIF":1.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11574816/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HAND","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/15589447241267766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The purpose of this study was to assess the effectiveness of an Artificial Intelligence-Large Language Model (AI-LLM) at improving the readability of hand and wrist radiology reports.

Methods: The radiology reports of 100 hand and/or wrist radiographs, 100 hand and/or wrist computed tomography (CT) scans, and 100 hand and/or wrist magnetic resonance imaging (MRI) scans were extracted. The following prompt command was inserted into the AI-LLM: "Explain this radiology report to a patient in layman's terms in the second person: [Report Text]." The report length, Flesch reading ease score (FRES), and Flesch-Kincaid reading level (FKRL) were calculated for the original radiology report and the AI-LLM-generated report. The accuracy of the AI-LLM report was assessed via a 5-point Likert scale. Any "hallucination" produced by the AI-LLM-generated report was recorded.

Results: There was a statistically significant improvement in mean FRES scores and FKRL scores in the AI-LLM-generated radiograph report, CT report, and MRI report. For all AI-LLM-generated reports, the mean reading level improved to below an eighth-grade reading level. The mean Likert score for the AI-LLM-generated radiograph report, CT report, and MRI report was 4.1 ± 0.6, 3.9 ± 0.6, and 3.9 ± 0.7, respectively. The hallucination rate in the AI-LLM-generated radiograph report, CT report, and MRI report was 3%, 6%, and 6%, respectively.

Conclusions: This study demonstrates that AI-LLM effectively improves the readability of hand and wrist radiology reports, underscoring the potential application of AI-LLM as a promising and innovative patient-centric strategy to improve patient comprehension of their imaging reports.Level of Evidence: IV.

查看原文本刊更多论文

解码放射学报告：人工智能大语言模型可提高手部和腕部骨科放射学报告的可读性。

研究背景本研究旨在评估人工智能大语言模型（AI-LLM）在提高手部和腕部放射学报告可读性方面的有效性：提取了 100 份手部和/或腕部 X 光片、100 份手部和/或腕部计算机断层扫描 (CT) 扫描以及 100 份手部和/或腕部磁共振成像 (MRI) 扫描的放射学报告。在 AI-LLM 中插入以下提示命令："用第二人称通俗易懂地向患者解释这份放射学报告：[报告文本]"。对原始放射学报告和 AI-LLM 生成的报告计算了报告长度、Flesch 阅读难易度评分（FRES）和 Flesch-Kincaid 阅读水平（FKRL）。AI-LLM 报告的准确性通过 5 点李克特量表进行评估。任何由 AI-LLM 生成的报告所产生的 "幻觉 "都会被记录下来：结果：AI-LLM 生成的放射照片报告、CT 报告和 MRI 报告的平均 FRES 分数和 FKRL 分数均有明显改善。在所有 AI-LLM 生成的报告中，平均阅读水平提高到了八年级阅读水平以下。AI-LLM 生成的放射照片报告、CT 报告和 MRI 报告的平均 Likert 分数分别为 4.1 ± 0.6、3.9 ± 0.6 和 3.9 ± 0.7。AI-LLM生成的X光片报告、CT报告和MRI报告中的幻觉率分别为3%、6%和6%：本研究表明，AI-LLM 能有效提高手部和腕部放射学报告的可读性，突出了 AI-LLM 作为一种以患者为中心的有前途的创新策略的潜在应用价值，以提高患者对影像报告的理解能力：证据等级：IV。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HAND Medicine-Surgery

CiteScore

3.30

自引率

0.00%

发文量

209

期刊介绍： HAND is the official journal of the American Association for Hand Surgery and is a peer-reviewed journal featuring articles written by clinicians worldwide presenting current research and clinical work in the field of hand surgery. It features articles related to all aspects of hand and upper extremity surgery and the post operative care and rehabilitation of the hand.