James J Butler, Ernesto Acosta, Michael C Kuna, Michael C Harrington, Andrew J Rosenbaum, Michael T Mulligan, John G Kennedy
{"title":"Decoding Radiology Reports: Artificial Intelligence-Large Language Models Can Improve the Readability of Hand and Wrist Orthopedic Radiology Reports.","authors":"James J Butler, Ernesto Acosta, Michael C Kuna, Michael C Harrington, Andrew J Rosenbaum, Michael T Mulligan, John G Kennedy","doi":"10.1177/15589447241267766","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The purpose of this study was to assess the effectiveness of an Artificial Intelligence-Large Language Model (AI-LLM) at improving the readability of hand and wrist radiology reports.</p><p><strong>Methods: </strong>The radiology reports of 100 hand and/or wrist radiographs, 100 hand and/or wrist computed tomography (CT) scans, and 100 hand and/or wrist magnetic resonance imaging (MRI) scans were extracted. The following prompt command was inserted into the AI-LLM: \"Explain this radiology report to a patient in layman's terms in the second person: [Report Text].\" The report length, Flesch reading ease score (FRES), and Flesch-Kincaid reading level (FKRL) were calculated for the original radiology report and the AI-LLM-generated report. The accuracy of the AI-LLM report was assessed via a 5-point Likert scale. Any \"hallucination\" produced by the AI-LLM-generated report was recorded.</p><p><strong>Results: </strong>There was a statistically significant improvement in mean FRES scores and FKRL scores in the AI-LLM-generated radiograph report, CT report, and MRI report. For all AI-LLM-generated reports, the mean reading level improved to below an eighth-grade reading level. The mean Likert score for the AI-LLM-generated radiograph report, CT report, and MRI report was 4.1 ± 0.6, 3.9 ± 0.6, and 3.9 ± 0.7, respectively. The hallucination rate in the AI-LLM-generated radiograph report, CT report, and MRI report was 3%, 6%, and 6%, respectively.</p><p><strong>Conclusions: </strong>This study demonstrates that AI-LLM effectively improves the readability of hand and wrist radiology reports, underscoring the potential application of AI-LLM as a promising and innovative patient-centric strategy to improve patient comprehension of their imaging reports.<b>Level of Evidence:</b> IV.</p>","PeriodicalId":12902,"journal":{"name":"HAND","volume":" ","pages":"1144-1152"},"PeriodicalIF":1.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11574816/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HAND","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/15589447241267766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The purpose of this study was to assess the effectiveness of an Artificial Intelligence-Large Language Model (AI-LLM) at improving the readability of hand and wrist radiology reports.
Methods: The radiology reports of 100 hand and/or wrist radiographs, 100 hand and/or wrist computed tomography (CT) scans, and 100 hand and/or wrist magnetic resonance imaging (MRI) scans were extracted. The following prompt command was inserted into the AI-LLM: "Explain this radiology report to a patient in layman's terms in the second person: [Report Text]." The report length, Flesch reading ease score (FRES), and Flesch-Kincaid reading level (FKRL) were calculated for the original radiology report and the AI-LLM-generated report. The accuracy of the AI-LLM report was assessed via a 5-point Likert scale. Any "hallucination" produced by the AI-LLM-generated report was recorded.
Results: There was a statistically significant improvement in mean FRES scores and FKRL scores in the AI-LLM-generated radiograph report, CT report, and MRI report. For all AI-LLM-generated reports, the mean reading level improved to below an eighth-grade reading level. The mean Likert score for the AI-LLM-generated radiograph report, CT report, and MRI report was 4.1 ± 0.6, 3.9 ± 0.6, and 3.9 ± 0.7, respectively. The hallucination rate in the AI-LLM-generated radiograph report, CT report, and MRI report was 3%, 6%, and 6%, respectively.
Conclusions: This study demonstrates that AI-LLM effectively improves the readability of hand and wrist radiology reports, underscoring the potential application of AI-LLM as a promising and innovative patient-centric strategy to improve patient comprehension of their imaging reports.Level of Evidence: IV.
期刊介绍:
HAND is the official journal of the American Association for Hand Surgery and is a peer-reviewed journal featuring articles written by clinicians worldwide presenting current research and clinical work in the field of hand surgery. It features articles related to all aspects of hand and upper extremity surgery and the post operative care and rehabilitation of the hand.