一项随机对照试验:“ChatGPT会犯错”警告失败。

IF 5.2 1区 教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu
{"title":"一项随机对照试验:“ChatGPT会犯错”警告失败。","authors":"Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu","doi":"10.1111/medu.70056","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Warnings are commonly used to signal the fallibility of AI systems like ChatGPT in clinical decision-making. Yet, little is known about whether such disclaimers influence medical students' diagnostic behaviour. Drawing on the Judge-Advisor System (JAS) theory, we investigated whether the warning alters advice-taking behaviour by modifying perceived advisor credibility.</p><p><strong>Method: </strong>In this randomized controlled trial, 186 fourth-year medical students evaluated three clinical vignettes with two diagnostic options. Each case was specifically designed to include the presentations of both diagnoses to make the case ambiguous. Students were randomly assigned to receive feedback either with (warning arm) or without (no-warning arm) a prominently displayed warning ('ChatGPT can make mistakes. Check important info'.). After submitting their initial response, students received ChatGPT-attributed disagreeing diagnostic feedback explaining why the alternate diagnosis was correct. Then they were given the opportunity to revise their original choice. Advice-taking was measured by whether students changed their diagnosis after viewing AI input. We analysed change rates, weight-of-advice (WoA) and used mixed-effects models to assess intervention effects.</p><p><strong>Results: </strong>The warning did not influence diagnostic changes (15.3% no-warning vs. 15.9% warning; OR = 1.09, 95% CI: 0.46-2.59, p = 0.84). The WoA was 0.15 (SD = 0.36), significantly lower than the 0.30 average in prior JAS meta-analysis (p < 0.001). Among students who retained their original diagnosis, the warning group showed a tendency toward providing explanations on why they disagree with the AI advisor (60% vs. 51%, p = 0.059).</p><p><strong>Conclusions: </strong>The students underweight AI's diagnostic advice. The disclaimer did not alter students' use of AI advice, suggesting that their perceived credibility of ChatGPT was already near a behavioural floor. This finding supports the existence of a credibility threshold, beyond which additional cautionary cues have limited effect. Our results refine advice-taking theory and signal that simple warnings may be insufficient to ensure calibrated trust in AI-supported learning.</p>","PeriodicalId":18370,"journal":{"name":"Medical Education","volume":" ","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"'ChatGPT can make mistakes' warnings fail: A randomized controlled trial.\",\"authors\":\"Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu\",\"doi\":\"10.1111/medu.70056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Warnings are commonly used to signal the fallibility of AI systems like ChatGPT in clinical decision-making. Yet, little is known about whether such disclaimers influence medical students' diagnostic behaviour. Drawing on the Judge-Advisor System (JAS) theory, we investigated whether the warning alters advice-taking behaviour by modifying perceived advisor credibility.</p><p><strong>Method: </strong>In this randomized controlled trial, 186 fourth-year medical students evaluated three clinical vignettes with two diagnostic options. Each case was specifically designed to include the presentations of both diagnoses to make the case ambiguous. Students were randomly assigned to receive feedback either with (warning arm) or without (no-warning arm) a prominently displayed warning ('ChatGPT can make mistakes. Check important info'.). After submitting their initial response, students received ChatGPT-attributed disagreeing diagnostic feedback explaining why the alternate diagnosis was correct. Then they were given the opportunity to revise their original choice. Advice-taking was measured by whether students changed their diagnosis after viewing AI input. We analysed change rates, weight-of-advice (WoA) and used mixed-effects models to assess intervention effects.</p><p><strong>Results: </strong>The warning did not influence diagnostic changes (15.3% no-warning vs. 15.9% warning; OR = 1.09, 95% CI: 0.46-2.59, p = 0.84). The WoA was 0.15 (SD = 0.36), significantly lower than the 0.30 average in prior JAS meta-analysis (p < 0.001). Among students who retained their original diagnosis, the warning group showed a tendency toward providing explanations on why they disagree with the AI advisor (60% vs. 51%, p = 0.059).</p><p><strong>Conclusions: </strong>The students underweight AI's diagnostic advice. The disclaimer did not alter students' use of AI advice, suggesting that their perceived credibility of ChatGPT was already near a behavioural floor. This finding supports the existence of a credibility threshold, beyond which additional cautionary cues have limited effect. Our results refine advice-taking theory and signal that simple warnings may be insufficient to ensure calibrated trust in AI-supported learning.</p>\",\"PeriodicalId\":18370,\"journal\":{\"name\":\"Medical Education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1111/medu.70056\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1111/medu.70056","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

摘要

背景:警告通常用于表明ChatGPT等人工智能系统在临床决策中的错误。然而,这种免责声明是否会影响医学生的诊断行为却鲜为人知。根据法官-顾问系统(JAS)理论,我们研究了警告是否会通过改变感知到的顾问可信度来改变建议采纳行为。方法:在这项随机对照试验中,186名四年级医学生用两种诊断方法评估了三个临床小片段。每个病例都是专门设计的,包括两种诊断的表现,使病例模棱两可。学生们被随机分配接受反馈,要么有(警示牌),要么没有(无警示牌),一个明显的警告(“ChatGPT可能会犯错”)。检查重要信息。在提交了他们最初的回答后,学生们收到了chatgpt归因于不同意的诊断反馈,解释了为什么替代诊断是正确的。然后他们有机会修改他们最初的选择。接受建议的衡量标准是学生在观看人工智能输入后是否改变了他们的诊断。我们分析了变化率、建议权重(WoA),并使用混合效应模型来评估干预效果。结果:警告不影响诊断改变(15.3%无警告vs. 15.9%有警告;OR = 1.09, 95% CI: 0.46-2.59, p = 0.84)。WoA为0.15 (SD = 0.36),显著低于先前JAS荟萃分析的平均值0.30 (p)。免责声明并未改变学生对人工智能建议的使用,这表明他们对ChatGPT的感知可信度已经接近行为底线。这一发现支持了可信度阈值的存在,超过这个阈值,额外的警告提示作用有限。我们的研究结果完善了建议采纳理论,并表明简单的警告可能不足以确保对人工智能支持的学习的校准信任。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
'ChatGPT can make mistakes' warnings fail: A randomized controlled trial.

Background: Warnings are commonly used to signal the fallibility of AI systems like ChatGPT in clinical decision-making. Yet, little is known about whether such disclaimers influence medical students' diagnostic behaviour. Drawing on the Judge-Advisor System (JAS) theory, we investigated whether the warning alters advice-taking behaviour by modifying perceived advisor credibility.

Method: In this randomized controlled trial, 186 fourth-year medical students evaluated three clinical vignettes with two diagnostic options. Each case was specifically designed to include the presentations of both diagnoses to make the case ambiguous. Students were randomly assigned to receive feedback either with (warning arm) or without (no-warning arm) a prominently displayed warning ('ChatGPT can make mistakes. Check important info'.). After submitting their initial response, students received ChatGPT-attributed disagreeing diagnostic feedback explaining why the alternate diagnosis was correct. Then they were given the opportunity to revise their original choice. Advice-taking was measured by whether students changed their diagnosis after viewing AI input. We analysed change rates, weight-of-advice (WoA) and used mixed-effects models to assess intervention effects.

Results: The warning did not influence diagnostic changes (15.3% no-warning vs. 15.9% warning; OR = 1.09, 95% CI: 0.46-2.59, p = 0.84). The WoA was 0.15 (SD = 0.36), significantly lower than the 0.30 average in prior JAS meta-analysis (p < 0.001). Among students who retained their original diagnosis, the warning group showed a tendency toward providing explanations on why they disagree with the AI advisor (60% vs. 51%, p = 0.059).

Conclusions: The students underweight AI's diagnostic advice. The disclaimer did not alter students' use of AI advice, suggesting that their perceived credibility of ChatGPT was already near a behavioural floor. This finding supports the existence of a credibility threshold, beyond which additional cautionary cues have limited effect. Our results refine advice-taking theory and signal that simple warnings may be insufficient to ensure calibrated trust in AI-supported learning.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Medical Education
Medical Education 医学-卫生保健
CiteScore
8.40
自引率
10.00%
发文量
279
审稿时长
4-8 weeks
期刊介绍: Medical Education seeks to be the pre-eminent journal in the field of education for health care professionals, and publishes material of the highest quality, reflecting world wide or provocative issues and perspectives. The journal welcomes high quality papers on all aspects of health professional education including; -undergraduate education -postgraduate training -continuing professional development -interprofessional education
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信