诊断决策中的偏差敏感性:将 ChatGPT 与住院医生进行比较。

IF 4.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Journal of General Internal Medicine Pub Date : 2025-03-01 Epub Date: 2024-11-07 DOI:10.1007/s11606-024-09177-9
Henk G Schmidt, Jerome I Rotgans, Silvia Mamede
{"title":"诊断决策中的偏差敏感性:将 ChatGPT 与住院医生进行比较。","authors":"Henk G Schmidt, Jerome I Rotgans, Silvia Mamede","doi":"10.1007/s11606-024-09177-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Diagnostic errors, often due to biases in clinical reasoning, significantly affect patient care. While artificial intelligence chatbots like ChatGPT could help mitigate such biases, their potential susceptibility to biases is unknown.</p><p><strong>Methods: </strong>This study evaluated diagnostic accuracy of ChatGPT against the performance of 265 medical residents in five previously published experiments aimed at inducing bias. The residents worked in several major teaching hospitals in the Netherlands. The biases studied were case-intrinsic (presence of salient distracting findings in the patient history, effects of disruptive patient behaviors) and situational (prior availability of a look-alike patient). ChatGPT's accuracy in identifying the most-likely diagnosis was measured.</p><p><strong>Results: </strong>Diagnostic accuracy of residents and ChatGPT was equivalent. For clinical cases involving case-intrinsic bias, both ChatGPT and the residents exhibited a decline in diagnostic accuracy. Residents' accuracy decreased on average 12%, while the accuracy of ChatGPT 4.0 decreased 21%. Accuracy of ChatGPT 3.5 decreased 9%. These findings suggest that, like human diagnosticians, ChatGPT is sensitive to bias when the biasing information is part of the patient history. When the biasing information was extrinsic to the case in the form of the prior availability of a look-alike case, residents' accuracy decreased by 15%. By contrast, ChatGPT's performance was not affected by the biasing information. Chi-square goodness-of-fit tests corroborated these outcomes.</p><p><strong>Conclusions: </strong>It seems that, while ChatGPT is not sensitive to bias when biasing information is situational, it is sensitive to bias when the biasing information is part of the patient's disease history. Its utility in diagnostic support has potential, but caution is advised. Future research should enhance AI's bias detection and mitigation to make it truly useful for diagnostic support.</p>","PeriodicalId":15860,"journal":{"name":"Journal of General Internal Medicine","volume":" ","pages":"790-795"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11914423/pdf/","citationCount":"0","resultStr":"{\"title\":\"Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents.\",\"authors\":\"Henk G Schmidt, Jerome I Rotgans, Silvia Mamede\",\"doi\":\"10.1007/s11606-024-09177-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Diagnostic errors, often due to biases in clinical reasoning, significantly affect patient care. While artificial intelligence chatbots like ChatGPT could help mitigate such biases, their potential susceptibility to biases is unknown.</p><p><strong>Methods: </strong>This study evaluated diagnostic accuracy of ChatGPT against the performance of 265 medical residents in five previously published experiments aimed at inducing bias. The residents worked in several major teaching hospitals in the Netherlands. The biases studied were case-intrinsic (presence of salient distracting findings in the patient history, effects of disruptive patient behaviors) and situational (prior availability of a look-alike patient). ChatGPT's accuracy in identifying the most-likely diagnosis was measured.</p><p><strong>Results: </strong>Diagnostic accuracy of residents and ChatGPT was equivalent. For clinical cases involving case-intrinsic bias, both ChatGPT and the residents exhibited a decline in diagnostic accuracy. Residents' accuracy decreased on average 12%, while the accuracy of ChatGPT 4.0 decreased 21%. Accuracy of ChatGPT 3.5 decreased 9%. These findings suggest that, like human diagnosticians, ChatGPT is sensitive to bias when the biasing information is part of the patient history. When the biasing information was extrinsic to the case in the form of the prior availability of a look-alike case, residents' accuracy decreased by 15%. By contrast, ChatGPT's performance was not affected by the biasing information. Chi-square goodness-of-fit tests corroborated these outcomes.</p><p><strong>Conclusions: </strong>It seems that, while ChatGPT is not sensitive to bias when biasing information is situational, it is sensitive to bias when the biasing information is part of the patient's disease history. Its utility in diagnostic support has potential, but caution is advised. Future research should enhance AI's bias detection and mitigation to make it truly useful for diagnostic support.</p>\",\"PeriodicalId\":15860,\"journal\":{\"name\":\"Journal of General Internal Medicine\",\"volume\":\" \",\"pages\":\"790-795\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11914423/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of General Internal Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s11606-024-09177-9\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of General Internal Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11606-024-09177-9","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:诊断错误通常是由于临床推理中的偏差造成的,对患者护理产生了重大影响。虽然像 ChatGPT 这样的人工智能聊天机器人可以帮助减轻这种偏差,但它们对偏差的潜在易感性还不得而知:本研究评估了 ChatGPT 的诊断准确性,并与 265 名医学住院医生在之前发表的五项旨在诱导偏见的实验中的表现进行了对比。这些住院医师在荷兰的几家大型教学医院工作。所研究的偏差包括病例内在偏差(患者病史中存在明显的干扰性发现、干扰性患者行为的影响)和情境偏差(事先存在相似患者)。对 ChatGPT 识别最可能诊断的准确性进行了测量:结果:住院医师和 ChatGPT 的诊断准确性相当。对于涉及病例内在偏倚的临床病例,ChatGPT 和住院医师的诊断准确性都有所下降。住院医师的准确率平均下降了 12%,而 ChatGPT 4.0 的准确率下降了 21%。ChatGPT 3.5 的准确率下降了 9%。这些发现表明,与人类诊断人员一样,当偏差信息是患者病史的一部分时,ChatGPT 对偏差也很敏感。当偏差信息是病例的外在信息,即先前存在相似病例时,住院医生的准确率下降了 15%。相比之下,ChatGPT 的表现不受偏倚信息的影响。卡方拟合优度检验证实了这些结果:看来,当偏差信息是情景信息时,ChatGPT 对偏差不敏感,但当偏差信息是患者病史的一部分时,ChatGPT 对偏差敏感。它在诊断支持方面的应用具有潜力,但建议谨慎使用。未来的研究应加强人工智能的偏差检测和缓解能力,使其真正用于诊断支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bias Sensitivity in Diagnostic Decision-Making: Comparing ChatGPT with Residents.

Background: Diagnostic errors, often due to biases in clinical reasoning, significantly affect patient care. While artificial intelligence chatbots like ChatGPT could help mitigate such biases, their potential susceptibility to biases is unknown.

Methods: This study evaluated diagnostic accuracy of ChatGPT against the performance of 265 medical residents in five previously published experiments aimed at inducing bias. The residents worked in several major teaching hospitals in the Netherlands. The biases studied were case-intrinsic (presence of salient distracting findings in the patient history, effects of disruptive patient behaviors) and situational (prior availability of a look-alike patient). ChatGPT's accuracy in identifying the most-likely diagnosis was measured.

Results: Diagnostic accuracy of residents and ChatGPT was equivalent. For clinical cases involving case-intrinsic bias, both ChatGPT and the residents exhibited a decline in diagnostic accuracy. Residents' accuracy decreased on average 12%, while the accuracy of ChatGPT 4.0 decreased 21%. Accuracy of ChatGPT 3.5 decreased 9%. These findings suggest that, like human diagnosticians, ChatGPT is sensitive to bias when the biasing information is part of the patient history. When the biasing information was extrinsic to the case in the form of the prior availability of a look-alike case, residents' accuracy decreased by 15%. By contrast, ChatGPT's performance was not affected by the biasing information. Chi-square goodness-of-fit tests corroborated these outcomes.

Conclusions: It seems that, while ChatGPT is not sensitive to bias when biasing information is situational, it is sensitive to bias when the biasing information is part of the patient's disease history. Its utility in diagnostic support has potential, but caution is advised. Future research should enhance AI's bias detection and mitigation to make it truly useful for diagnostic support.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of General Internal Medicine
Journal of General Internal Medicine 医学-医学:内科
CiteScore
7.70
自引率
5.30%
发文量
749
审稿时长
3-6 weeks
期刊介绍: The Journal of General Internal Medicine is the official journal of the Society of General Internal Medicine. It promotes improved patient care, research, and education in primary care, general internal medicine, and hospital medicine. Its articles focus on topics such as clinical medicine, epidemiology, prevention, health care delivery, curriculum development, and numerous other non-traditional themes, in addition to classic clinical research on problems in internal medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信