Allen Ao Guo, Ashan Canagasingham, Krishan Rasiah, Venu Chalasani, Julie Mundy, Amanda Chung
{"title":"人工智能在外科教育中日益重要的作用:ChatGPT承担澳大利亚通用外科科学考试。","authors":"Allen Ao Guo, Ashan Canagasingham, Krishan Rasiah, Venu Chalasani, Julie Mundy, Amanda Chung","doi":"10.1111/ans.70186","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models have undergone vast development in recent years. The advent of large language models such as ChatGPT may play an important role in enhancing future medical education.</p><p><strong>Methods: </strong>To evaluate the accuracy and performance of ChatGPT in the Generic Surgical Sciences Examination, we constructed a sample examination used to assess ChatGPT. Questions were sourced from a past questions bank and formatted to mirror the structure and layout of the examination. The performance of ChatGPT was assessed based on a predefined answer key recorded earlier.</p><p><strong>Results: </strong>ChatGPT scored a total of 468 marks out of a maximum total of 644 marks, scoring a final percentage of 72.7% across all sections tested. ChatGPT performed best in the physiology section, scoring 77.9%, followed by pathology, scoring 75.0%, and scored lowest in the anatomy section with 66.3%. When scoring was analyzed by question type, it was identified that ChatGPT performed best in the type \"A\" questions (multiple choice), scoring a total of 75%, which was followed closely by its performance in type \"X\" questions (true or false), where ChatGPT scored 73.2%. However, ChatGPT only scored 43.8% when answering type \"B\" questions (establishing a relationship between two statements).</p><p><strong>Conclusion: </strong>Our results demonstrate that ChatGPT completed the Generic Surgical Sciences Examination with accuracy exceeding the required threshold for a pass in this examination. However, the large language model struggled with certain question types and sections. Overall, further research regarding the utility of ChatGPT in surgical education is required, and caution should be exercised with its use, as it remains in its infancy stages.</p>","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Growing Role of Artificial Intelligence in Surgical Education: ChatGPT Undertakes the Australian Generic Surgical Sciences Examination.\",\"authors\":\"Allen Ao Guo, Ashan Canagasingham, Krishan Rasiah, Venu Chalasani, Julie Mundy, Amanda Chung\",\"doi\":\"10.1111/ans.70186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models have undergone vast development in recent years. The advent of large language models such as ChatGPT may play an important role in enhancing future medical education.</p><p><strong>Methods: </strong>To evaluate the accuracy and performance of ChatGPT in the Generic Surgical Sciences Examination, we constructed a sample examination used to assess ChatGPT. Questions were sourced from a past questions bank and formatted to mirror the structure and layout of the examination. The performance of ChatGPT was assessed based on a predefined answer key recorded earlier.</p><p><strong>Results: </strong>ChatGPT scored a total of 468 marks out of a maximum total of 644 marks, scoring a final percentage of 72.7% across all sections tested. ChatGPT performed best in the physiology section, scoring 77.9%, followed by pathology, scoring 75.0%, and scored lowest in the anatomy section with 66.3%. When scoring was analyzed by question type, it was identified that ChatGPT performed best in the type \\\"A\\\" questions (multiple choice), scoring a total of 75%, which was followed closely by its performance in type \\\"X\\\" questions (true or false), where ChatGPT scored 73.2%. However, ChatGPT only scored 43.8% when answering type \\\"B\\\" questions (establishing a relationship between two statements).</p><p><strong>Conclusion: </strong>Our results demonstrate that ChatGPT completed the Generic Surgical Sciences Examination with accuracy exceeding the required threshold for a pass in this examination. However, the large language model struggled with certain question types and sections. Overall, further research regarding the utility of ChatGPT in surgical education is required, and caution should be exercised with its use, as it remains in its infancy stages.</p>\",\"PeriodicalId\":8158,\"journal\":{\"name\":\"ANZ Journal of Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ANZ Journal of Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/ans.70186\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ans.70186","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
The Growing Role of Artificial Intelligence in Surgical Education: ChatGPT Undertakes the Australian Generic Surgical Sciences Examination.
Background: Large language models have undergone vast development in recent years. The advent of large language models such as ChatGPT may play an important role in enhancing future medical education.
Methods: To evaluate the accuracy and performance of ChatGPT in the Generic Surgical Sciences Examination, we constructed a sample examination used to assess ChatGPT. Questions were sourced from a past questions bank and formatted to mirror the structure and layout of the examination. The performance of ChatGPT was assessed based on a predefined answer key recorded earlier.
Results: ChatGPT scored a total of 468 marks out of a maximum total of 644 marks, scoring a final percentage of 72.7% across all sections tested. ChatGPT performed best in the physiology section, scoring 77.9%, followed by pathology, scoring 75.0%, and scored lowest in the anatomy section with 66.3%. When scoring was analyzed by question type, it was identified that ChatGPT performed best in the type "A" questions (multiple choice), scoring a total of 75%, which was followed closely by its performance in type "X" questions (true or false), where ChatGPT scored 73.2%. However, ChatGPT only scored 43.8% when answering type "B" questions (establishing a relationship between two statements).
Conclusion: Our results demonstrate that ChatGPT completed the Generic Surgical Sciences Examination with accuracy exceeding the required threshold for a pass in this examination. However, the large language model struggled with certain question types and sections. Overall, further research regarding the utility of ChatGPT in surgical education is required, and caution should be exercised with its use, as it remains in its infancy stages.
期刊介绍:
ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.