Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.

IF 3.3 2区 哲学 Q1 ETHICS
Ali A Khan, Ali R Khan, Saminah Munshi, Hari Dandapani, Mohamed Jimale, Franck M Bogni, Hussain Khawaja
{"title":"Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.","authors":"Ali A Khan, Ali R Khan, Saminah Munshi, Hari Dandapani, Mohamed Jimale, Franck M Bogni, Hussain Khawaja","doi":"10.1136/jme-2024-110240","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The integration of artificial intelligence (AI) into healthcare introduces innovative possibilities but raises ethical, legal and professional concerns. Assessing the performance of AI in core components of the United States Medical Licensing Examination (USMLE), such as communication skills, ethics, empathy and professionalism, is crucial. This study evaluates how well ChatGPT versions 3.5 and 4.0 handle complex medical scenarios using USMLE-Rx, AMBOSS and UWorld question banks, aiming to understand its ability to navigate patient interactions according to medical ethics and standards.</p><p><strong>Methods: </strong>We compiled 273 questions from AMBOSS, USMLE-Rx and UWorld, focusing on communication, social sciences, healthcare policy and ethics. GPT-3.5 and GPT-4 were tasked with answering and justifying their choices in new chat sessions to minimise model interference. Responses were compared against question bank rationales and average student performance to evaluate AI effectiveness in medical ethical decision-making.</p><p><strong>Results: </strong>GPT-3.5 answered 38.9% correctly in AMBOSS, 54.1% in USMLE-Rx and 57.4% in UWorld, with rationale accuracy rates of 83.3%, 90.0% and 87.0%, respectively. GPT-4 answered 75.9% correctly in AMBOSS, 64.9% in USMLE-Rx and 79.6% in UWorld, with rationale accuracy rates of 85.4%, 88.9%, and 98.8%, respectively. Both versions generally scored below average student performance, except GPT-4 in UWorld.</p><p><strong>Conclusion: </strong>ChatGPT, particularly version 4.0, shows potential in navigating ethical and interpersonal medical scenarios. However, human reasoning currently surpasses AI in average performance. Continued development and training of AI systems can enhance proficiency in these critical healthcare aspects.</p>","PeriodicalId":16317,"journal":{"name":"Journal of Medical Ethics","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Ethics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1136/jme-2024-110240","RegionNum":2,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The integration of artificial intelligence (AI) into healthcare introduces innovative possibilities but raises ethical, legal and professional concerns. Assessing the performance of AI in core components of the United States Medical Licensing Examination (USMLE), such as communication skills, ethics, empathy and professionalism, is crucial. This study evaluates how well ChatGPT versions 3.5 and 4.0 handle complex medical scenarios using USMLE-Rx, AMBOSS and UWorld question banks, aiming to understand its ability to navigate patient interactions according to medical ethics and standards.

Methods: We compiled 273 questions from AMBOSS, USMLE-Rx and UWorld, focusing on communication, social sciences, healthcare policy and ethics. GPT-3.5 and GPT-4 were tasked with answering and justifying their choices in new chat sessions to minimise model interference. Responses were compared against question bank rationales and average student performance to evaluate AI effectiveness in medical ethical decision-making.

Results: GPT-3.5 answered 38.9% correctly in AMBOSS, 54.1% in USMLE-Rx and 57.4% in UWorld, with rationale accuracy rates of 83.3%, 90.0% and 87.0%, respectively. GPT-4 answered 75.9% correctly in AMBOSS, 64.9% in USMLE-Rx and 79.6% in UWorld, with rationale accuracy rates of 85.4%, 88.9%, and 98.8%, respectively. Both versions generally scored below average student performance, except GPT-4 in UWorld.

Conclusion: ChatGPT, particularly version 4.0, shows potential in navigating ethical and interpersonal medical scenarios. However, human reasoning currently surpasses AI in average performance. Continued development and training of AI systems can enhance proficiency in these critical healthcare aspects.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Medical Ethics
Journal of Medical Ethics 医学-医学:伦理
CiteScore
7.80
自引率
9.80%
发文量
164
审稿时长
4-8 weeks
期刊介绍: Journal of Medical Ethics is a leading international journal that reflects the whole field of medical ethics. The journal seeks to promote ethical reflection and conduct in scientific research and medical practice. It features articles on various ethical aspects of health care relevant to health care professionals, members of clinical ethics committees, medical ethics professionals, researchers and bioscientists, policy makers and patients. Subscribers to the Journal of Medical Ethics also receive Medical Humanities journal at no extra cost. JME is the official journal of the Institute of Medical Ethics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信