Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.

IF 3.8 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Academic Radiology Pub Date : 2025-06-19 DOI:10.1016/j.acra.2025.05.065

Dusan Pisarcik, Marc Kissling, Jakob Heimer, Monika Farkas, Cornelia Leo, Rahel A Kubik-Huch, André Euler

{"title":"Artificial Intelligence Language Models to Translate Professional Radiology Mammography Reports Into Plain Language - Impact on Interpretability and Perception by Patients.","authors":"Dusan Pisarcik, Marc Kissling, Jakob Heimer, Monika Farkas, Cornelia Leo, Rahel A Kubik-Huch, André Euler","doi":"10.1016/j.acra.2025.05.065","DOIUrl":null,"url":null,"abstract":"Rationale and objectives: This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey.Materials and methods: In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences.Results: 40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15).Conclusion: Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2025.05.065","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Rationale and objectives: This study aimed to evaluate the interpretability and patient perception of AI-translated mammography and sonography reports, focusing on comprehensibility, follow-up recommendations, and conveyed empathy using a survey.

Materials and methods: In this observational study, three fictional mammography and sonography reports with BI-RADS categories 3, 4, and 5 were created. These reports were repeatedly translated to plain language by three different large language models (LLM: ChatGPT-4, ChatGPT-4o, Google Gemini). In a first step, the best of these repeatedly translated reports for each BI-RADS category and LLM was selected by two experts in breast imaging considering factual correctness, completeness, and quality. In a second step, female participants compared and rated the translated reports regarding comprehensibility, follow-up recommendations, conveyed empathy, and additional value of each report using a survey with Likert scales. Statistical analysis included cumulative link mixed models and the Plackett-Luce model for ranking preferences.

Results: 40 females participated in the survey. GPT-4 and GPT-4o were rated significantly higher than Gemini across all categories (P<.001). Participants >50 years of age rated the reports significantly higher as compared to participants of 18-29 years of age (P<.05). Higher education predicted lower ratings (P=.02). No prior mammography increased scores (P=.03), and AI-experience had no effect (P=.88). Ranking analysis showed GPT-4o as the most preferred (P=.48), followed by GPT-4 (P=.37), with Gemini ranked last (P=.15).

Conclusion: Patient preference differed among AI-translated radiology reports. Compared to a traditional report using radiological language, AI-translated reports add value for patients, enhance comprehensibility and empathy and therefore hold the potential to improve patient communication in breast imaging.

查看原文本刊更多论文

人工智能语言模型将专业放射学乳房x光检查报告翻译成通俗易懂的语言-对患者可解释性和感知的影响。

基本原理和目的：本研究旨在评估人工智能翻译的乳房x光检查和超声检查报告的可解释性和患者感知，重点关注可理解性、随访建议和通过调查传达的同理心。材料和方法：在这项观察性研究中，创建了三份虚构的乳房x线和超声报告，BI-RADS分类为3,4和5。这些报告被三种不同的大型语言模型（LLM: ChatGPT-4, chatgpt - 40，谷歌Gemini）反复翻译成通俗语言。第一步，由两位乳腺成像专家根据事实的正确性、完整性和质量，从这些重复翻译的BI-RADS类别和LLM报告中选出最好的。第二步，女性参与者使用李克特量表对翻译报告的可理解性、后续建议、传达的同理心和每份报告的附加价值进行比较和评分。统计分析包括累积链接混合模型和Plackett-Luce偏好排序模型。结果：40名女性参与调查。在所有类别中，GPT-4和gpt - 40的评分明显高于Gemini （P50岁的参与者对报告的评分明显高于18-29岁的参与者）。结论：人工智能翻译的放射学报告中患者的偏好不同。与使用放射学语言的传统报告相比，人工智能翻译的报告为患者增加了价值，增强了可理解性和同理心，因此有可能改善乳房成像中的患者沟通。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Academic Radiology 医学-核医学

CiteScore

7.60

自引率

10.40%

发文量

432

审稿时长

18 days

期刊介绍： Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.