Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study.

IF 3.2 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Ting-Yun Huang, Pei Hsing Hsieh, Yung-Chun Chang
{"title":"Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study.","authors":"Ting-Yun Huang, Pei Hsing Hsieh, Yung-Chun Chang","doi":"10.2196/59902","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study explores the cutting-edge abilities of large language models (LLMs) such as ChatGPT in medical history taking and medical record documentation, with a focus on their practical effectiveness in clinical settings-an area vital for the progress of medical artificial intelligence.</p><p><strong>Objective: </strong>Our aim was to assess the capability of ChatGPT versions 3.5 and 4.0 in performing medical history taking and medical record documentation in simulated clinical environments. The study compared the performance of nonmedical individuals using ChatGPT with that of junior medical residents.</p><p><strong>Methods: </strong>A simulation involving standardized patients was designed to mimic authentic medical history-taking interactions. Five nonmedical participants used ChatGPT versions 3.5 and 4.0 to conduct medical histories and document medical records, mirroring the tasks performed by 5 junior residents in identical scenarios. A total of 10 diverse scenarios were examined.</p><p><strong>Results: </strong>Evaluation of the medical documentation created by laypersons with ChatGPT assistance and those created by junior residents was conducted by 2 senior emergency physicians using audio recordings and the final medical records. The assessment used the Objective Structured Clinical Examination benchmarks in Taiwan as a reference. ChatGPT-4.0 exhibited substantial enhancements over its predecessor and met or exceeded the performance of human counterparts in terms of both checklist and global assessment scores. Although the overall quality of human consultations remained higher, ChatGPT-4.0's proficiency in medical documentation was notably promising.</p><p><strong>Conclusions: </strong>The performance of ChatGPT 4.0 was on par with that of human participants in Objective Structured Clinical Examination evaluations, signifying its potential in medical history and medical record documentation. Despite this, the superiority of human consultations in terms of quality was evident. The study underscores both the promise and the current limitations of LLMs in the realm of clinical practice.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"10 ","pages":"e59902"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612517/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/59902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: This study explores the cutting-edge abilities of large language models (LLMs) such as ChatGPT in medical history taking and medical record documentation, with a focus on their practical effectiveness in clinical settings-an area vital for the progress of medical artificial intelligence.

Objective: Our aim was to assess the capability of ChatGPT versions 3.5 and 4.0 in performing medical history taking and medical record documentation in simulated clinical environments. The study compared the performance of nonmedical individuals using ChatGPT with that of junior medical residents.

Methods: A simulation involving standardized patients was designed to mimic authentic medical history-taking interactions. Five nonmedical participants used ChatGPT versions 3.5 and 4.0 to conduct medical histories and document medical records, mirroring the tasks performed by 5 junior residents in identical scenarios. A total of 10 diverse scenarios were examined.

Results: Evaluation of the medical documentation created by laypersons with ChatGPT assistance and those created by junior residents was conducted by 2 senior emergency physicians using audio recordings and the final medical records. The assessment used the Objective Structured Clinical Examination benchmarks in Taiwan as a reference. ChatGPT-4.0 exhibited substantial enhancements over its predecessor and met or exceeded the performance of human counterparts in terms of both checklist and global assessment scores. Although the overall quality of human consultations remained higher, ChatGPT-4.0's proficiency in medical documentation was notably promising.

Conclusions: The performance of ChatGPT 4.0 was on par with that of human participants in Objective Structured Clinical Examination evaluations, signifying its potential in medical history and medical record documentation. Despite this, the superiority of human consultations in terms of quality was evident. The study underscores both the promise and the current limitations of LLMs in the realm of clinical practice.

初级住院医师与ChatGPT在病历采集与病历记录的客观结构化临床检查(OSCE)中的表现比较:发展与可用性研究。
背景:本研究探讨了ChatGPT等大型语言模型(llm)在病史采集和病历记录方面的前沿能力,重点关注它们在临床环境中的实际有效性——这是医疗人工智能进步的关键领域。目的:我们的目的是评估ChatGPT版本3.5和4.0在模拟临床环境中进行病史记录和病历记录的能力。该研究比较了使用ChatGPT的非医疗个体与初级医疗住院医师的表现。方法:设计一个涉及标准化患者的模拟,以模拟真实的病史采集互动。5名非医疗参与者使用ChatGPT版本3.5和4.0来记录病史和记录医疗记录,反映了5名初级住院医生在相同场景下所执行的任务。总共研究了10种不同的情景。结果:2名资深急诊医师利用录音资料和最终病历对ChatGPT辅助下外行和初级住院医师撰写的病历进行评估。本评估以台湾客观结构化临床检查基准为参照。ChatGPT-4.0比其前身有了实质性的增强,在检查表和总体评估分数方面达到或超过了人类同行的表现。虽然人类咨询的总体质量仍然较高,但ChatGPT-4.0在医疗文档方面的熟练程度明显有希望。结论:ChatGPT 4.0在客观结构化临床检查评估中的表现与人类参与者相当,表明其在病史和病历记录方面的潜力。尽管如此,人力协商在质量方面的优势是显而易见的。该研究强调了llm在临床实践领域的前景和当前的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Medical Education
JMIR Medical Education Social Sciences-Education
CiteScore
6.90
自引率
5.60%
发文量
54
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信