人工智能在外科教育中日益重要的作用：ChatGPT承担澳大利亚通用外科科学考试。

IF 1.6 4区医学 Q3 SURGERY

ANZ Journal of Surgery Pub Date : 2025-05-30 DOI:10.1111/ans.70186

Allen Ao Guo, Ashan Canagasingham, Krishan Rasiah, Venu Chalasani, Julie Mundy, Amanda Chung

{"title":"人工智能在外科教育中日益重要的作用：ChatGPT承担澳大利亚通用外科科学考试。","authors":"Allen Ao Guo, Ashan Canagasingham, Krishan Rasiah, Venu Chalasani, Julie Mundy, Amanda Chung","doi":"10.1111/ans.70186","DOIUrl":null,"url":null,"abstract":"Background: Large language models have undergone vast development in recent years. The advent of large language models such as ChatGPT may play an important role in enhancing future medical education.Methods: To evaluate the accuracy and performance of ChatGPT in the Generic Surgical Sciences Examination, we constructed a sample examination used to assess ChatGPT. Questions were sourced from a past questions bank and formatted to mirror the structure and layout of the examination. The performance of ChatGPT was assessed based on a predefined answer key recorded earlier.Results: ChatGPT scored a total of 468 marks out of a maximum total of 644 marks, scoring a final percentage of 72.7% across all sections tested. ChatGPT performed best in the physiology section, scoring 77.9%, followed by pathology, scoring 75.0%, and scored lowest in the anatomy section with 66.3%. When scoring was analyzed by question type, it was identified that ChatGPT performed best in the type \"A\" questions (multiple choice), scoring a total of 75%, which was followed closely by its performance in type \"X\" questions (true or false), where ChatGPT scored 73.2%. However, ChatGPT only scored 43.8% when answering type \"B\" questions (establishing a relationship between two statements).Conclusion: Our results demonstrate that ChatGPT completed the Generic Surgical Sciences Examination with accuracy exceeding the required threshold for a pass in this examination. However, the large language model struggled with certain question types and sections. Overall, further research regarding the utility of ChatGPT in surgical education is required, and caution should be exercised with its use, as it remains in its infancy stages.","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Growing Role of Artificial Intelligence in Surgical Education: ChatGPT Undertakes the Australian Generic Surgical Sciences Examination.\",\"authors\":\"Allen Ao Guo, Ashan Canagasingham, Krishan Rasiah, Venu Chalasani, Julie Mundy, Amanda Chung\",\"doi\":\"10.1111/ans.70186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Large language models have undergone vast development in recent years. The advent of large language models such as ChatGPT may play an important role in enhancing future medical education.Methods: To evaluate the accuracy and performance of ChatGPT in the Generic Surgical Sciences Examination, we constructed a sample examination used to assess ChatGPT. Questions were sourced from a past questions bank and formatted to mirror the structure and layout of the examination. The performance of ChatGPT was assessed based on a predefined answer key recorded earlier.Results: ChatGPT scored a total of 468 marks out of a maximum total of 644 marks, scoring a final percentage of 72.7% across all sections tested. ChatGPT performed best in the physiology section, scoring 77.9%, followed by pathology, scoring 75.0%, and scored lowest in the anatomy section with 66.3%. When scoring was analyzed by question type, it was identified that ChatGPT performed best in the type \\\"A\\\" questions (multiple choice), scoring a total of 75%, which was followed closely by its performance in type \\\"X\\\" questions (true or false), where ChatGPT scored 73.2%. However, ChatGPT only scored 43.8% when answering type \\\"B\\\" questions (establishing a relationship between two statements).Conclusion: Our results demonstrate that ChatGPT completed the Generic Surgical Sciences Examination with accuracy exceeding the required threshold for a pass in this examination. However, the large language model struggled with certain question types and sections. Overall, further research regarding the utility of ChatGPT in surgical education is required, and caution should be exercised with its use, as it remains in its infancy stages.\",\"PeriodicalId\":8158,\"journal\":{\"name\":\"ANZ Journal of Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ANZ Journal of Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/ans.70186\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ans.70186","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

背景：近年来，大型语言模型得到了巨大的发展。ChatGPT等大型语言模型的出现可能在加强未来的医学教育方面发挥重要作用。方法：为了评估ChatGPT在普通外科科学检查中的准确性和性能，我们构建了一个样本检查来评估ChatGPT。题目取自过去的题库，并按照考试的结构和布局进行格式化。ChatGPT的性能是基于先前记录的预定义答案键来评估的。结果：ChatGPT总分为468分，满分为644分，所有测试部分的最终得分率为72.7%。ChatGPT在生理部分得分最高，为77.9%，其次是病理，得分为75.0%，解剖部分得分最低，为66.3%。当按问题类型分析得分时，可以确定ChatGPT在“A”类型问题（选择题）中表现最好，总分为75%，其次是“X”类型问题（真或假），其中ChatGPT得分为73.2%。然而，在回答“B”类问题（建立两个陈述之间的关系）时，ChatGPT的得分只有43.8%。结论：我们的结果表明，ChatGPT完成了通用外科科学考试，其准确性超过了通过该考试所需的阈值。然而，大型语言模型在某些问题类型和部分上遇到了困难。总的来说，需要进一步研究ChatGPT在外科教育中的应用，并且应该谨慎使用，因为它仍处于起步阶段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Growing Role of Artificial Intelligence in Surgical Education: ChatGPT Undertakes the Australian Generic Surgical Sciences Examination.

Background: Large language models have undergone vast development in recent years. The advent of large language models such as ChatGPT may play an important role in enhancing future medical education.

Methods: To evaluate the accuracy and performance of ChatGPT in the Generic Surgical Sciences Examination, we constructed a sample examination used to assess ChatGPT. Questions were sourced from a past questions bank and formatted to mirror the structure and layout of the examination. The performance of ChatGPT was assessed based on a predefined answer key recorded earlier.

Results: ChatGPT scored a total of 468 marks out of a maximum total of 644 marks, scoring a final percentage of 72.7% across all sections tested. ChatGPT performed best in the physiology section, scoring 77.9%, followed by pathology, scoring 75.0%, and scored lowest in the anatomy section with 66.3%. When scoring was analyzed by question type, it was identified that ChatGPT performed best in the type "A" questions (multiple choice), scoring a total of 75%, which was followed closely by its performance in type "X" questions (true or false), where ChatGPT scored 73.2%. However, ChatGPT only scored 43.8% when answering type "B" questions (establishing a relationship between two statements).

Conclusion: Our results demonstrate that ChatGPT completed the Generic Surgical Sciences Examination with accuracy exceeding the required threshold for a pass in this examination. However, the large language model struggled with certain question types and sections. Overall, further research regarding the utility of ChatGPT in surgical education is required, and caution should be exercised with its use, as it remains in its infancy stages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ANZ Journal of Surgery 医学-外科

CiteScore

2.50

自引率

11.80%

发文量

720

审稿时长

2 months

期刊介绍： ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.