简化的伦理：在人工智能生成的放射学报告中平衡患者的自主性、理解力和准确性。

IF 3.1 1区哲学 Q1 ETHICS

BMC Medical Ethics Pub Date : 2025-10-15 DOI:10.1186/s12910-025-01285-3

Hong-Seon Lee, Seung-Hyun Song, Chaeri Park, Jeongrok Seo, Won Hwa Kim, Jaeil Kim, Sungjun Kim, Kyunghwa Han, Young Han Lee

{"title":"简化的伦理：在人工智能生成的放射学报告中平衡患者的自主性、理解力和准确性。","authors":"Hong-Seon Lee, Seung-Hyun Song, Chaeri Park, Jeongrok Seo, Won Hwa Kim, Jaeil Kim, Sungjun Kim, Kyunghwa Han, Young Han Lee","doi":"10.1186/s12910-025-01285-3","DOIUrl":null,"url":null,"abstract":"Background: Large language models (LLMs) such as GPT-4 are increasingly used to simplify radiology reports and improve patient comprehension. However, excessive simplification may undermine informed consent and autonomy by compromising clinical accuracy. This study investigates the ethical implications of readability thresholds in AI-generated radiology reports, identifying the minimum reading level at which clinical accuracy is preserved.Methods: We retrospectively analyzed 500 computed tomography and magnetic resonance imaging reports from a tertiary hospital. Each report was transformed into 17 versions (reading grade levels 1-17) using GPT-4 Turbo. Readability metrics and word counts were calculated for each version. Clinical accuracy was evaluated using radiologist assessments and PubMed-BERTScore. We identified the first grade level at which a statistically significant decline in accuracy occurred, determining the lowest level that preserved both accuracy and readability. We further assessed potential clinical consequences in reports simplified to the 7th-grade level.Results: Readability scores showed strong correlation with prompted reading levels (r = 0.80-0.84). Accuracy remained stable across grades 13-11 but declined significantly below grade 11. At the 7th-grade level, 20% of reports contained inaccuracies with potential to alter patient management, primarily due to omission, incorrect conversion, or inappropriate generalization. The 11th-grade level emerged as the current lower bound for preserving accuracy in LLM-generated radiology reports.Conclusions: Our findings highlight an ethical tension between improving readability and maintaining clinical accuracy. While 7th-grade readability remains an ethical ideal, current AI tools cannot reliably produce accurate reports below the 11th-grade level. Ethical implementation of AI-generated reporting should include layered communication strategies and model transparency to safeguard patient autonomy and comprehension.","PeriodicalId":55348,"journal":{"name":"BMC Medical Ethics","volume":"26 1","pages":"136"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523008/pdf/","citationCount":"0","resultStr":"{\"title\":\"The ethics of simplification: balancing patient autonomy, comprehension, and accuracy in AI-generated radiology reports.\",\"authors\":\"Hong-Seon Lee, Seung-Hyun Song, Chaeri Park, Jeongrok Seo, Won Hwa Kim, Jaeil Kim, Sungjun Kim, Kyunghwa Han, Young Han Lee\",\"doi\":\"10.1186/s12910-025-01285-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Large language models (LLMs) such as GPT-4 are increasingly used to simplify radiology reports and improve patient comprehension. However, excessive simplification may undermine informed consent and autonomy by compromising clinical accuracy. This study investigates the ethical implications of readability thresholds in AI-generated radiology reports, identifying the minimum reading level at which clinical accuracy is preserved.Methods: We retrospectively analyzed 500 computed tomography and magnetic resonance imaging reports from a tertiary hospital. Each report was transformed into 17 versions (reading grade levels 1-17) using GPT-4 Turbo. Readability metrics and word counts were calculated for each version. Clinical accuracy was evaluated using radiologist assessments and PubMed-BERTScore. We identified the first grade level at which a statistically significant decline in accuracy occurred, determining the lowest level that preserved both accuracy and readability. We further assessed potential clinical consequences in reports simplified to the 7th-grade level.Results: Readability scores showed strong correlation with prompted reading levels (r = 0.80-0.84). Accuracy remained stable across grades 13-11 but declined significantly below grade 11. At the 7th-grade level, 20% of reports contained inaccuracies with potential to alter patient management, primarily due to omission, incorrect conversion, or inappropriate generalization. The 11th-grade level emerged as the current lower bound for preserving accuracy in LLM-generated radiology reports.Conclusions: Our findings highlight an ethical tension between improving readability and maintaining clinical accuracy. While 7th-grade readability remains an ethical ideal, current AI tools cannot reliably produce accurate reports below the 11th-grade level. Ethical implementation of AI-generated reporting should include layered communication strategies and model transparency to safeguard patient autonomy and comprehension.\",\"PeriodicalId\":55348,\"journal\":{\"name\":\"BMC Medical Ethics\",\"volume\":\"26 1\",\"pages\":\"136\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523008/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Ethics\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1186/s12910-025-01285-3\",\"RegionNum\":1,\"RegionCategory\":\"哲学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ETHICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Ethics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1186/s12910-025-01285-3","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：像GPT-4这样的大型语言模型（llm）越来越多地用于简化放射学报告和提高患者理解。然而，过度简化可能会损害临床准确性，从而破坏知情同意和自主权。本研究调查了人工智能生成的放射学报告的可读性阈值的伦理意义，确定了保留临床准确性的最低阅读水平。方法：回顾性分析某三级医院500例计算机断层和磁共振成像报告。使用GPT-4 Turbo将每份报告转换为17个版本（阅读年级1-17级）。每个版本都计算了可读性指标和字数。使用放射科医师评估和PubMed-BERTScore评估临床准确性。我们确定了统计上准确性显著下降的第一年级水平，确定了保持准确性和可读性的最低水平。我们在简化到7年级水平的报告中进一步评估了潜在的临床后果。结果：易读性评分与提示阅读水平有较强的相关性（r = 0.80-0.84）。准确率在13-11年级之间保持稳定，但在11年级以下显著下降。在七年级水平，20%的报告包含可能改变患者管理的不准确信息，主要是由于遗漏、不正确的转换或不适当的概括。在llm生成的放射学报告中，11年级水平成为当前保持准确性的下限。结论：我们的研究结果强调了提高可读性和保持临床准确性之间的伦理紧张关系。虽然7年级的可读性仍然是一个道德理想，但目前的人工智能工具无法可靠地生成低于11年级水平的准确报告。人工智能生成报告的道德实施应包括分层沟通策略和模型透明度，以保障患者的自主权和理解力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

The ethics of simplification: balancing patient autonomy, comprehension, and accuracy in AI-generated radiology reports.

查看原文本刊更多论文

The ethics of simplification: balancing patient autonomy, comprehension, and accuracy in AI-generated radiology reports.

Background: Large language models (LLMs) such as GPT-4 are increasingly used to simplify radiology reports and improve patient comprehension. However, excessive simplification may undermine informed consent and autonomy by compromising clinical accuracy. This study investigates the ethical implications of readability thresholds in AI-generated radiology reports, identifying the minimum reading level at which clinical accuracy is preserved.

Methods: We retrospectively analyzed 500 computed tomography and magnetic resonance imaging reports from a tertiary hospital. Each report was transformed into 17 versions (reading grade levels 1-17) using GPT-4 Turbo. Readability metrics and word counts were calculated for each version. Clinical accuracy was evaluated using radiologist assessments and PubMed-BERTScore. We identified the first grade level at which a statistically significant decline in accuracy occurred, determining the lowest level that preserved both accuracy and readability. We further assessed potential clinical consequences in reports simplified to the 7th-grade level.

Results: Readability scores showed strong correlation with prompted reading levels (r = 0.80-0.84). Accuracy remained stable across grades 13-11 but declined significantly below grade 11. At the 7th-grade level, 20% of reports contained inaccuracies with potential to alter patient management, primarily due to omission, incorrect conversion, or inappropriate generalization. The 11th-grade level emerged as the current lower bound for preserving accuracy in LLM-generated radiology reports.

Conclusions: Our findings highlight an ethical tension between improving readability and maintaining clinical accuracy. While 7th-grade readability remains an ethical ideal, current AI tools cannot reliably produce accurate reports below the 11th-grade level. Ethical implementation of AI-generated reporting should include layered communication strategies and model transparency to safeguard patient autonomy and comprehension.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Medical Ethics MEDICAL ETHICS-

CiteScore

5.20

自引率

7.40%

发文量

108

审稿时长

>12 weeks

期刊介绍： BMC Medical Ethics is an open access journal publishing original peer-reviewed research articles in relation to the ethical aspects of biomedical research and clinical practice, including professional choices and conduct, medical technologies, healthcare systems and health policies.