Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.
IF 3.8 2区 医学Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Dusan Pisarcik, Marc Kissling, Jakob Heimer, Monika Farkas, Cornelia Leo, Rahel A Kubik-Huch, André Euler
{"title":"Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.","authors":"Dusan Pisarcik, Marc Kissling, Jakob Heimer, Monika Farkas, Cornelia Leo, Rahel A Kubik-Huch, André Euler","doi":"10.1016/j.acra.2025.05.065","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale and objectives: </strong>This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey.</p><p><strong>Materials and methods: </strong>In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences.</p><p><strong>Results: </strong>40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15).</p><p><strong>Conclusion: </strong>Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.</p>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2025.05.065","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Rationale and objectives: This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey.
Materials and methods: In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences.
Results: 40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15).
Conclusion: Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.
期刊介绍:
Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.