ChatGPT提供的急性乳房症状建议的适当性

IF 1.5 4区医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Clinical Imaging Pub Date : 2025-06-16 DOI:10.1016/j.clinimag.2025.110549

Clifton Byrd , Chase Kingsbury , Bethany Niell , Kimberly Funaro , Asha Bhatt , R. Jared Weinfurtner , Dana Ataya

{"title":"ChatGPT提供的急性乳房症状建议的适当性","authors":"Clifton Byrd , Chase Kingsbury , Bethany Niell , Kimberly Funaro , Asha Bhatt , R. Jared Weinfurtner , Dana Ataya","doi":"10.1016/j.clinimag.2025.110549","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>We evaluated the accuracy of ChatGPT-3.5's responses to common questions regarding acute breast symptoms and explored whether using lay language, as opposed to medical language, affected the accuracy of the responses.</div></div><div><h3>Methods</h3><div>Questions were formulated addressing acute breast conditions, informed by the American College of Radiology (ACR) Appropriateness Criteria (AC) and our clinical experience at a tertiary referral breast center. Of these, seven addressed the most common acute breast symptoms, nine addressed pregnancy-associated breast symptoms, and four addressed specific management and imaging recommendations for a palpable breast abnormality. Questions were submitted three times to ChatGPT-3.5 and all responses were assessed by five fellowship-trained breast radiologists. Evaluation criteria included clinical judgment and adherence to the ACR guidelines, with responses scored as: 1) “appropriate,” 2) “inappropriate” if any response contained inappropriate information, or 3) “unreliable” if responses were inconsistent. A majority vote determined the appropriateness for each question.</div></div><div><h3>Results</h3><div>ChatGPT-3.5 generated responses were appropriate for 7/7 (100 %) questions regarding common acute breast symptoms when phrased both colloquially and using standard medical terminology. In contrast, ChatGPT-3.5 generated responses were appropriate for 3/9 (33 %) questions about pregnancy-associated breast symptoms and 3/4 (75 %) questions about management and imaging recommendations for a palpable breast abnormality.</div></div><div><h3>Conclusion</h3><div>ChatGPT-3.5 can automate healthcare information related to appropriate management of acute breast symptoms when prompted with both standard medical terminology or lay phrasing of the questions. However, physician oversight remains critical given the presence of inappropriate recommendations for pregnancy associated breast symptoms and management of palpable abnormalities.</div></div>","PeriodicalId":50680,"journal":{"name":"Clinical Imaging","volume":"125 ","pages":"Article 110549"},"PeriodicalIF":1.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Appropriateness of acute breast symptom recommendations provided by ChatGPT\",\"authors\":\"Clifton Byrd , Chase Kingsbury , Bethany Niell , Kimberly Funaro , Asha Bhatt , R. Jared Weinfurtner , Dana Ataya\",\"doi\":\"10.1016/j.clinimag.2025.110549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>We evaluated the accuracy of ChatGPT-3.5's responses to common questions regarding acute breast symptoms and explored whether using lay language, as opposed to medical language, affected the accuracy of the responses.</div></div><div><h3>Methods</h3><div>Questions were formulated addressing acute breast conditions, informed by the American College of Radiology (ACR) Appropriateness Criteria (AC) and our clinical experience at a tertiary referral breast center. Of these, seven addressed the most common acute breast symptoms, nine addressed pregnancy-associated breast symptoms, and four addressed specific management and imaging recommendations for a palpable breast abnormality. Questions were submitted three times to ChatGPT-3.5 and all responses were assessed by five fellowship-trained breast radiologists. Evaluation criteria included clinical judgment and adherence to the ACR guidelines, with responses scored as: 1) “appropriate,” 2) “inappropriate” if any response contained inappropriate information, or 3) “unreliable” if responses were inconsistent. A majority vote determined the appropriateness for each question.</div></div><div><h3>Results</h3><div>ChatGPT-3.5 generated responses were appropriate for 7/7 (100 %) questions regarding common acute breast symptoms when phrased both colloquially and using standard medical terminology. In contrast, ChatGPT-3.5 generated responses were appropriate for 3/9 (33 %) questions about pregnancy-associated breast symptoms and 3/4 (75 %) questions about management and imaging recommendations for a palpable breast abnormality.</div></div><div><h3>Conclusion</h3><div>ChatGPT-3.5 can automate healthcare information related to appropriate management of acute breast symptoms when prompted with both standard medical terminology or lay phrasing of the questions. However, physician oversight remains critical given the presence of inappropriate recommendations for pregnancy associated breast symptoms and management of palpable abnormalities.</div></div>\",\"PeriodicalId\":50680,\"journal\":{\"name\":\"Clinical Imaging\",\"volume\":\"125 \",\"pages\":\"Article 110549\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0899707125001494\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0899707125001494","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：我们评估ChatGPT-3.5对有关急性乳房症状的常见问题的回答的准确性，并探讨使用非专业语言而不是医学语言是否会影响回答的准确性。方法根据美国放射学会（ACR）适宜性标准（AC）和我们在三级转诊乳腺中心的临床经验，针对急性乳腺疾病制定问题。其中，7项针对最常见的急性乳房症状，9项针对妊娠相关的乳房症状，4项针对可触及的乳房异常的具体处理和影像学建议。问题被提交给ChatGPT-3.5三次，所有的回答都由五名接受过奖学金培训的乳腺放射科医生进行评估。评估标准包括临床判断和对ACR指南的依从性，反应分为：1)“适当”，2)“不适当”（如果任何反应包含不适当的信息），或3)“不可靠”（如果反应不一致）。结果在口语表达和使用标准医学术语时，atgpt -3.5生成的回答对7/7（100%）关于常见急性乳房症状的问题都是合适的。相比之下，ChatGPT-3.5对3/9（33%）的妊娠相关乳房症状问题和3/4（75%）的可触及乳房异常的处理和影像学建议问题的回答是正确的。结论当提示标准医学术语或问题的措辞时，ChatGPT-3.5可以自动提供与适当处理急性乳房症状相关的医疗保健信息。然而，医生的疏忽仍然是至关重要的，因为存在不适当的建议，对妊娠相关的乳房症状和管理可触及的异常。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Appropriateness of acute breast symptom recommendations provided by ChatGPT

Purpose

We evaluated the accuracy of ChatGPT-3.5's responses to common questions regarding acute breast symptoms and explored whether using lay language, as opposed to medical language, affected the accuracy of the responses.

Methods

Questions were formulated addressing acute breast conditions, informed by the American College of Radiology (ACR) Appropriateness Criteria (AC) and our clinical experience at a tertiary referral breast center. Of these, seven addressed the most common acute breast symptoms, nine addressed pregnancy-associated breast symptoms, and four addressed specific management and imaging recommendations for a palpable breast abnormality. Questions were submitted three times to ChatGPT-3.5 and all responses were assessed by five fellowship-trained breast radiologists. Evaluation criteria included clinical judgment and adherence to the ACR guidelines, with responses scored as: 1) “appropriate,” 2) “inappropriate” if any response contained inappropriate information, or 3) “unreliable” if responses were inconsistent. A majority vote determined the appropriateness for each question.

Results

ChatGPT-3.5 generated responses were appropriate for 7/7 (100 %) questions regarding common acute breast symptoms when phrased both colloquially and using standard medical terminology. In contrast, ChatGPT-3.5 generated responses were appropriate for 3/9 (33 %) questions about pregnancy-associated breast symptoms and 3/4 (75 %) questions about management and imaging recommendations for a palpable breast abnormality.

Conclusion

ChatGPT-3.5 can automate healthcare information related to appropriate management of acute breast symptoms when prompted with both standard medical terminology or lay phrasing of the questions. However, physician oversight remains critical given the presence of inappropriate recommendations for pregnancy associated breast symptoms and management of palpable abnormalities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Imaging 医学-核医学

CiteScore

4.60

自引率

0.00%

发文量

265

审稿时长

35 days

期刊介绍： The mission of Clinical Imaging is to publish, in a timely manner, the very best radiology research from the United States and around the world with special attention to the impact of medical imaging on patient care. The journal''s publications cover all imaging modalities, radiology issues related to patients, policy and practice improvements, and clinically-oriented imaging physics and informatics. The journal is a valuable resource for practicing radiologists, radiologists-in-training and other clinicians with an interest in imaging. Papers are carefully peer-reviewed and selected by our experienced subject editors who are leading experts spanning the range of imaging sub-specialties, which include: -Body Imaging- Breast Imaging- Cardiothoracic Imaging- Imaging Physics and Informatics- Molecular Imaging and Nuclear Medicine- Musculoskeletal and Emergency Imaging- Neuroradiology- Practice, Policy & Education- Pediatric Imaging- Vascular and Interventional Radiology