大型语言模型在MRCS上的表现:医学教育的工具?

IF 1.7 4区 医学 Q3 SURGERY
A Yiu, K Lam
{"title":"大型语言模型在MRCS上的表现:医学教育的工具?","authors":"A Yiu, K Lam","doi":"10.1308/rcsann.2023.0085","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The Intercollegiate Membership of the Royal College of Surgeons examination (MRCS) Part A assesses generic surgical sciences and applied knowledge using 300 multiple-choice Single Best Answer items. Large Language Models (LLMs) are trained on vast amounts of text to generate natural language outputs, and applications in healthcare and medical education are rising.</p><p><strong>Methods: </strong>Two LLMs, ChatGPT (OpenAI) and Bard (Google AI), were tested using 300 questions from a popular MRCS Part A question bank without/with need for justification (NJ/J). LLM outputs were scored according to accuracy, concordance and insight.</p><p><strong>Results: </strong>ChatGPT achieved 85.7%/84.3% accuracy for NJ/J encodings. Bard achieved 64%/64.3% accuracy for NJ/J encodings. ChatGPT and Bard displayed high levels of concordance for NJ (95.3%; 81.7%) and J (93.7%; 79.7%) encodings, respectively. ChatGPT and Bard provided an insightful statement in >98% and >86% outputs, respectively.</p><p><strong>Discussion: </strong>This study demonstrates that ChatGPT achieves passing-level accuracy at MRCS Part A, and both LLMs achieve high concordance and provide insightful responses to test questions. Instances of clinically inappropriate or inaccurate decision-making, incomplete appreciation of nuanced clinical scenarios and utilisation of out-of-date guidance was, however, noted. LLMs are accessible and time-efficient tools, access vast clinical knowledge, and may reduce the emphasis on factual recall in medical education and assessment.</p><p><strong>Conclusion: </strong>ChatGPT achieves passing-level accuracy for MRCS Part A with concordant and insightful outputs. Future applications of LLMs in healthcare must be cautious of hallucinations and incorrect reasoning but have the potential to develop AI-supported clinicians.</p>","PeriodicalId":8088,"journal":{"name":"Annals of the Royal College of Surgeons of England","volume":" ","pages":"434-440"},"PeriodicalIF":1.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208737/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of large language models at the MRCS Part A: a tool for medical education?\",\"authors\":\"A Yiu, K Lam\",\"doi\":\"10.1308/rcsann.2023.0085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>The Intercollegiate Membership of the Royal College of Surgeons examination (MRCS) Part A assesses generic surgical sciences and applied knowledge using 300 multiple-choice Single Best Answer items. Large Language Models (LLMs) are trained on vast amounts of text to generate natural language outputs, and applications in healthcare and medical education are rising.</p><p><strong>Methods: </strong>Two LLMs, ChatGPT (OpenAI) and Bard (Google AI), were tested using 300 questions from a popular MRCS Part A question bank without/with need for justification (NJ/J). LLM outputs were scored according to accuracy, concordance and insight.</p><p><strong>Results: </strong>ChatGPT achieved 85.7%/84.3% accuracy for NJ/J encodings. Bard achieved 64%/64.3% accuracy for NJ/J encodings. ChatGPT and Bard displayed high levels of concordance for NJ (95.3%; 81.7%) and J (93.7%; 79.7%) encodings, respectively. ChatGPT and Bard provided an insightful statement in >98% and >86% outputs, respectively.</p><p><strong>Discussion: </strong>This study demonstrates that ChatGPT achieves passing-level accuracy at MRCS Part A, and both LLMs achieve high concordance and provide insightful responses to test questions. Instances of clinically inappropriate or inaccurate decision-making, incomplete appreciation of nuanced clinical scenarios and utilisation of out-of-date guidance was, however, noted. LLMs are accessible and time-efficient tools, access vast clinical knowledge, and may reduce the emphasis on factual recall in medical education and assessment.</p><p><strong>Conclusion: </strong>ChatGPT achieves passing-level accuracy for MRCS Part A with concordant and insightful outputs. Future applications of LLMs in healthcare must be cautious of hallucinations and incorrect reasoning but have the potential to develop AI-supported clinicians.</p>\",\"PeriodicalId\":8088,\"journal\":{\"name\":\"Annals of the Royal College of Surgeons of England\",\"volume\":\" \",\"pages\":\"434-440\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208737/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of the Royal College of Surgeons of England\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1308/rcsann.2023.0085\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/12/1 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Royal College of Surgeons of England","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1308/rcsann.2023.0085","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/1 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

摘要

简介:英国皇家外科医师学会校际会员资格考试(MRCS) A部分通过300个单项最佳答案,评估通用外科科学和应用知识。大型语言模型(llm)在大量文本上进行训练以生成自然语言输出,并且在医疗保健和医学教育中的应用正在增加。方法:两个法学硕士,ChatGPT (OpenAI)和Bard (Google AI),使用来自流行的MRCS Part a题库的300个问题进行测试,无需/不需要证明(NJ/J)。根据准确性、一致性和洞察力对LLM输出进行评分。结果:ChatGPT对NJ/J编码的准确率分别为85.7%/84.3%。巴德对NJ/J编码的准确率达到64%/64.3%。ChatGPT和Bard在NJ上显示出高度的一致性(95.3%;81.7%), J (93.7%);79.7%)编码。ChatGPT和Bard分别在>98%和>86%的输出中提供了深刻的陈述。讨论:本研究表明,ChatGPT在MRCS Part A中达到了及格水平的准确性,两个llm都达到了很高的一致性,并对测试问题提供了深刻的回答。然而,注意到临床上不适当或不准确的决策,对细微临床情况的不完全理解以及使用过时的指南的情况。法学硕士是一种方便快捷的工具,可以获得大量的临床知识,并可能减少医学教育和评估中对事实回忆的强调。结论:ChatGPT对MRCS Part A的准确度达到了及格水平,输出结果一致、深刻。法学硕士在医疗保健领域的未来应用必须警惕幻觉和错误的推理,但有可能发展人工智能支持的临床医生。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Performance of large language models at the MRCS Part A: a tool for medical education?

Performance of large language models at the MRCS Part A: a tool for medical education?

Performance of large language models at the MRCS Part A: a tool for medical education?

Performance of large language models at the MRCS Part A: a tool for medical education?

Introduction: The Intercollegiate Membership of the Royal College of Surgeons examination (MRCS) Part A assesses generic surgical sciences and applied knowledge using 300 multiple-choice Single Best Answer items. Large Language Models (LLMs) are trained on vast amounts of text to generate natural language outputs, and applications in healthcare and medical education are rising.

Methods: Two LLMs, ChatGPT (OpenAI) and Bard (Google AI), were tested using 300 questions from a popular MRCS Part A question bank without/with need for justification (NJ/J). LLM outputs were scored according to accuracy, concordance and insight.

Results: ChatGPT achieved 85.7%/84.3% accuracy for NJ/J encodings. Bard achieved 64%/64.3% accuracy for NJ/J encodings. ChatGPT and Bard displayed high levels of concordance for NJ (95.3%; 81.7%) and J (93.7%; 79.7%) encodings, respectively. ChatGPT and Bard provided an insightful statement in >98% and >86% outputs, respectively.

Discussion: This study demonstrates that ChatGPT achieves passing-level accuracy at MRCS Part A, and both LLMs achieve high concordance and provide insightful responses to test questions. Instances of clinically inappropriate or inaccurate decision-making, incomplete appreciation of nuanced clinical scenarios and utilisation of out-of-date guidance was, however, noted. LLMs are accessible and time-efficient tools, access vast clinical knowledge, and may reduce the emphasis on factual recall in medical education and assessment.

Conclusion: ChatGPT achieves passing-level accuracy for MRCS Part A with concordant and insightful outputs. Future applications of LLMs in healthcare must be cautious of hallucinations and incorrect reasoning but have the potential to develop AI-supported clinicians.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.40
自引率
0.00%
发文量
316
期刊介绍: The Annals of The Royal College of Surgeons of England is the official scholarly research journal of the Royal College of Surgeons and is published eight times a year in January, February, March, April, May, July, September and November. The main aim of the journal is to publish high-quality, peer-reviewed papers that relate to all branches of surgery. The Annals also includes letters and comments, a regular technical section, controversial topics, CORESS feedback and book reviews. The editorial board is composed of experts from all the surgical specialties.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信