医疗检查中的人工智能增强：在FRCS创伤和骨科检查中从ChatGPT 3.5到ChatGPT 4.0的飞跃。

IF 2.3 4区医学 Q2 SURGERY

Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland Pub Date : 2024-11-28 DOI:10.1016/j.surge.2024.11.008

Akib Majed Khan, Khaled Maher Sarraf, Ashley Iain Simpson

{"title":"医疗检查中的人工智能增强：在FRCS创伤和骨科检查中从ChatGPT 3.5到ChatGPT 4.0的飞跃。","authors":"Akib Majed Khan, Khaled Maher Sarraf, Ashley Iain Simpson","doi":"10.1016/j.surge.2024.11.008","DOIUrl":null,"url":null,"abstract":"Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance.Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth).Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor.Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.","PeriodicalId":49463,"journal":{"name":"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination.\",\"authors\":\"Akib Majed Khan, Khaled Maher Sarraf, Ashley Iain Simpson\",\"doi\":\"10.1016/j.surge.2024.11.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance.Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth).Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor.Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.\",\"PeriodicalId\":49463,\"journal\":{\"name\":\"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.surge.2024.11.008\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.surge.2024.11.008","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

简介：ChatGPT是一个复杂的人工智能模型，能够根据它接收到的输入生成类似人类的文本。ChatGPT 3.5显示由于在以往的研究中缺乏高阶判断而无法通过FRCS （tr&north）考试。ChatGPT 4.0中的增强保证了对其性能的评估。方法：将2022年12月英国培训考试中的问题输入ChatGPT 3.5和4.0。复制先前研究的方法以保持一致性，允许在两个模型版本之间进行直接比较。性能门槛保持在65.8%，与2022年11月FRCS （tr&north）第1部分的会议一致。结果：ChatGPT 4.0达到了及格分数（73.9%），表明其分析临床信息和做出决策的能力有所提高，反映了一名称职的创伤和骨科咨询师。与ChatGPT 4.0相比，3.5版本的得分低38.1%，这代表了显著差异(p结论：ChatGPT 4.0成功通过FRCS （tr&north）考试的第一部分，突显了人工智能技术的快速发展及其在医疗保健和教育领域的潜在应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination.

Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance.

Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth).

Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor.

Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland 医学-外科

CiteScore

4.40

自引率

0.00%

发文量

158

审稿时长

6-12 weeks

期刊介绍： Since its establishment in 2003, The Surgeon has established itself as one of the leading multidisciplinary surgical titles, both in print and online. The Surgeon is published for the worldwide surgical and dental communities. The goal of the Journal is to achieve wider national and international recognition, through a commitment to excellence in original research. In addition, both Colleges see the Journal as an important educational service, and consequently there is a particular focus on post-graduate development. Much of our educational role will continue to be achieved through publishing expanded review articles by leaders in their field. Articles in related areas to surgery and dentistry, such as healthcare management and education, are also welcomed. We aim to educate, entertain, give insight into new surgical techniques and technology, and provide a forum for debate and discussion.