Hong-Seon Lee, Sungjun Kim, Songsoo Kim, Jeongrok Seo, Won Hwa Kim, Jaeil Kim, Kyunghwa Han, Shin Hye Hwang, Young Han Lee
{"title":"Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels.","authors":"Hong-Seon Lee, Sungjun Kim, Songsoo Kim, Jeongrok Seo, Won Hwa Kim, Jaeil Kim, Kyunghwa Han, Shin Hye Hwang, Young Han Lee","doi":"10.1007/s11547-025-02098-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To examine how reading grade levels affect stakeholder preferences based on a trade-off between accuracy and readability.</p><p><strong>Material and methods: </strong>A retrospective study of 500 radiology reports from academic and community hospitals across five imaging modalities was conducted. Reports were transformed into 11 reading grade levels (7-17) using Gemini. Accuracy, readability, and preference were rated on a 5-point scale by radiologists, physicians, and laypersons. Errors (generalizations, omissions, hallucinations) and potential changes in patient management (PCPM) were identified. Ordinal logistic regression analyzed preference predictors, and weighted kappa measured interobserver reliability.</p><p><strong>Results: </strong>Preferences varied across reading grade levels depending on stakeholder group, modality, and clinical setting. Overall, preferences peaked at grade 16, but declined at grade 17, particularly among laypersons. Lower reading grades improved readability but increased errors, while higher grades improved accuracy but reduced readability. In multivariable analysis, accuracy was the strongest predictor of preference for all groups (OR: 30.29, 33.05, and 2.16; p <0 .001), followed by readability (OR: 2.73, 1.70, 2.01; p <0.001).</p><p><strong>Conclusion: </strong>Higher-grade levels were generally preferred due to better accuracy, with a range of 12-17. Further increasing grade levels reduced readability sharply, limiting preference. These findings highlight the limitations of unsupervised LLM transformations and suggest the need for hybrid approaches that maintain original reports while incorporating explanatory content to balance accuracy and readability.</p>","PeriodicalId":20817,"journal":{"name":"Radiologia Medica","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiologia Medica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11547-025-02098-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To examine how reading grade levels affect stakeholder preferences based on a trade-off between accuracy and readability.
Material and methods: A retrospective study of 500 radiology reports from academic and community hospitals across five imaging modalities was conducted. Reports were transformed into 11 reading grade levels (7-17) using Gemini. Accuracy, readability, and preference were rated on a 5-point scale by radiologists, physicians, and laypersons. Errors (generalizations, omissions, hallucinations) and potential changes in patient management (PCPM) were identified. Ordinal logistic regression analyzed preference predictors, and weighted kappa measured interobserver reliability.
Results: Preferences varied across reading grade levels depending on stakeholder group, modality, and clinical setting. Overall, preferences peaked at grade 16, but declined at grade 17, particularly among laypersons. Lower reading grades improved readability but increased errors, while higher grades improved accuracy but reduced readability. In multivariable analysis, accuracy was the strongest predictor of preference for all groups (OR: 30.29, 33.05, and 2.16; p <0 .001), followed by readability (OR: 2.73, 1.70, 2.01; p <0.001).
Conclusion: Higher-grade levels were generally preferred due to better accuracy, with a range of 12-17. Further increasing grade levels reduced readability sharply, limiting preference. These findings highlight the limitations of unsupervised LLM transformations and suggest the need for hybrid approaches that maintain original reports while incorporating explanatory content to balance accuracy and readability.
期刊介绍:
Felice Perussia founded La radiologia medica in 1914. It is a peer-reviewed journal and serves as the official journal of the Italian Society of Medical and Interventional Radiology (SIRM). The primary purpose of the journal is to disseminate information related to Radiology, especially advancements in diagnostic imaging and related disciplines. La radiologia medica welcomes original research on both fundamental and clinical aspects of modern radiology, with a particular focus on diagnostic and interventional imaging techniques. It also covers topics such as radiotherapy, nuclear medicine, radiobiology, health physics, and artificial intelligence in the context of clinical implications. The journal includes various types of contributions such as original articles, review articles, editorials, short reports, and letters to the editor. With an esteemed Editorial Board and a selection of insightful reports, the journal is an indispensable resource for radiologists and professionals in related fields. Ultimately, La radiologia medica aims to serve as a platform for international collaboration and knowledge sharing within the radiological community.