探索人工智能在土耳其骨科进展考试中的作用。

Gokhan Ayik, Ulas Can Kolac, Taha Aksoy, Abdurrahman Yilmaz, Mazlum Veysel Sili, Mazhar Tokgozoglu, Gazi Huri
{"title":"探索人工智能在土耳其骨科进展考试中的作用。","authors":"Gokhan Ayik, Ulas Can Kolac, Taha Aksoy, Abdurrahman Yilmaz, Mazlum Veysel Sili, Mazhar Tokgozoglu, Gazi Huri","doi":"10.5152/j.aott.2025.24090","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The aim of this study was to evaluate and compare the performance of the artificial intelligence (AI) models ChatGPT-3.5, ChatGPT-4, and Gemini on the Turkish Specialization Training and Development Examination (UEGS) to determine their utility in medical education and their potential to improve patient care.</p><p><strong>Methods: </strong>This retrospective study analyzed responses of ChatGPT-3.5, ChatGPT-4, and Gemini to 1000 true or false questions from UEGS administered over 5 years (2018-2023). Questions, encompassing 9 orthopedic subspecialties, were categorized by 2 independent residents, with discrepancies resolved by a senior author. Artificial intelligence models were restarted for each query to prevent data retention. Performance was evaluated by calculating net scores and comparing them to orthopedic resident scores obtained from the Turkish Orthopedics and Traumatology Education Council (TOTEK) database. Statistical analyses included chi-squared tests, Bonferroni-adjusted Z tests, Cochran's Q test, and receiver operating characteristic (ROC) analysis to determine the optimal question length for AI accuracy. All AI responses were generated independently without retaining prior information.</p><p><strong>Results: </strong>Significant di!erences in AI tool accuracy were observed across di!erent years and subspecialties (P < .001). ChatGPT-4 consistently outperformed other models, achieving the highest overall accuracy (95% in specific subspecialties). Notably, ChatGPT-4 demonstrated superior performance in Basic and General Orthopedics and Foot and Ankle Surgery, while Gemini and ChatGPT-3.5 showed variability in accuracy across topics and years. Receiver operating characteristic analysis revealed a significant relationship between shorter letter counts and higher accuracy for ChatGPT-4 (P=.002). ChatGPT-4 showed significant negative correlations between letter count and accuracy across all years (r=\"0.099, P=.002), outperformed residents in basic and general orthopedics (P=.015) and trauma (P=.012), unlike other AI models.</p><p><strong>Conclusion: </strong>The findings underscore the advancing role of AI in the medical field, with ChatGPT-4 demonstrating significant potential as a tool for medical education and clinical decision-making. Continuous evaluation and refinement of AI technologies are essential to enhance their educational and clinical impact.</p>","PeriodicalId":93854,"journal":{"name":"Acta orthopaedica et traumatologica turcica","volume":"59 1","pages":"18-26"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11992947/pdf/","citationCount":"0","resultStr":"{\"title\":\"Exploring the role of artificial intelligence in Turkish orthopedic progression exams.\",\"authors\":\"Gokhan Ayik, Ulas Can Kolac, Taha Aksoy, Abdurrahman Yilmaz, Mazlum Veysel Sili, Mazhar Tokgozoglu, Gazi Huri\",\"doi\":\"10.5152/j.aott.2025.24090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>The aim of this study was to evaluate and compare the performance of the artificial intelligence (AI) models ChatGPT-3.5, ChatGPT-4, and Gemini on the Turkish Specialization Training and Development Examination (UEGS) to determine their utility in medical education and their potential to improve patient care.</p><p><strong>Methods: </strong>This retrospective study analyzed responses of ChatGPT-3.5, ChatGPT-4, and Gemini to 1000 true or false questions from UEGS administered over 5 years (2018-2023). Questions, encompassing 9 orthopedic subspecialties, were categorized by 2 independent residents, with discrepancies resolved by a senior author. Artificial intelligence models were restarted for each query to prevent data retention. Performance was evaluated by calculating net scores and comparing them to orthopedic resident scores obtained from the Turkish Orthopedics and Traumatology Education Council (TOTEK) database. Statistical analyses included chi-squared tests, Bonferroni-adjusted Z tests, Cochran's Q test, and receiver operating characteristic (ROC) analysis to determine the optimal question length for AI accuracy. All AI responses were generated independently without retaining prior information.</p><p><strong>Results: </strong>Significant di!erences in AI tool accuracy were observed across di!erent years and subspecialties (P < .001). ChatGPT-4 consistently outperformed other models, achieving the highest overall accuracy (95% in specific subspecialties). Notably, ChatGPT-4 demonstrated superior performance in Basic and General Orthopedics and Foot and Ankle Surgery, while Gemini and ChatGPT-3.5 showed variability in accuracy across topics and years. Receiver operating characteristic analysis revealed a significant relationship between shorter letter counts and higher accuracy for ChatGPT-4 (P=.002). ChatGPT-4 showed significant negative correlations between letter count and accuracy across all years (r=\\\"0.099, P=.002), outperformed residents in basic and general orthopedics (P=.015) and trauma (P=.012), unlike other AI models.</p><p><strong>Conclusion: </strong>The findings underscore the advancing role of AI in the medical field, with ChatGPT-4 demonstrating significant potential as a tool for medical education and clinical decision-making. Continuous evaluation and refinement of AI technologies are essential to enhance their educational and clinical impact.</p>\",\"PeriodicalId\":93854,\"journal\":{\"name\":\"Acta orthopaedica et traumatologica turcica\",\"volume\":\"59 1\",\"pages\":\"18-26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11992947/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta orthopaedica et traumatologica turcica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5152/j.aott.2025.24090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta orthopaedica et traumatologica turcica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5152/j.aott.2025.24090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究的目的是评估和比较人工智能(AI)模型ChatGPT-3.5、ChatGPT-4和Gemini在土耳其专业培训和发展考试(UEGS)中的表现,以确定它们在医学教育中的效用及其改善患者护理的潜力。方法:本回顾性研究分析了ChatGPT-3.5、ChatGPT-4和Gemini对5年(2018-2023)UEGS中1000个真假问题的回答。问题包括9个骨科亚专科,由2名独立住院医师分类,差异由一名资深作者解决。每个查询都重新启动人工智能模型,以防止数据保留。通过计算净分数并将其与土耳其骨科和创伤学教育委员会(TOTEK)数据库中获得的骨科住院医师分数进行比较来评估表现。统计分析包括卡方检验、Bonferroni-adjusted Z检验、Cochran’s Q检验和受试者工作特征(ROC)分析,以确定人工智能准确性的最佳问题长度。所有的人工智能反应都是独立产生的,没有保留先验信息。结果:显著di!人工智能工具精度的差异在不同地区被观察到。事件年份和亚专科(P < 0.001)。ChatGPT-4始终优于其他模型,实现了最高的总体准确率(在特定的子专业中达到95%)。值得注意的是,ChatGPT-4在基础和普通骨科以及足部和踝关节手术中表现出优越的性能,而Gemini和ChatGPT-3.5在不同主题和年份的准确性上表现出差异。接受者工作特征分析显示,ChatGPT-4中较短的字母计数与较高的准确性之间存在显著关系(P= 0.002)。与其他人工智能模型不同,ChatGPT-4在所有年份的字母计数和准确性之间显示出显著的负相关(r= 0.099, P= 0.002),在基础和普通骨科(P= 0.015)和创伤(P= 0.012)方面表现优于住院医生。结论:研究结果强调了人工智能在医学领域的推进作用,ChatGPT-4显示出作为医学教育和临床决策工具的巨大潜力。持续评估和改进人工智能技术对于增强其教育和临床影响至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring the role of artificial intelligence in Turkish orthopedic progression exams.

Objective: The aim of this study was to evaluate and compare the performance of the artificial intelligence (AI) models ChatGPT-3.5, ChatGPT-4, and Gemini on the Turkish Specialization Training and Development Examination (UEGS) to determine their utility in medical education and their potential to improve patient care.

Methods: This retrospective study analyzed responses of ChatGPT-3.5, ChatGPT-4, and Gemini to 1000 true or false questions from UEGS administered over 5 years (2018-2023). Questions, encompassing 9 orthopedic subspecialties, were categorized by 2 independent residents, with discrepancies resolved by a senior author. Artificial intelligence models were restarted for each query to prevent data retention. Performance was evaluated by calculating net scores and comparing them to orthopedic resident scores obtained from the Turkish Orthopedics and Traumatology Education Council (TOTEK) database. Statistical analyses included chi-squared tests, Bonferroni-adjusted Z tests, Cochran's Q test, and receiver operating characteristic (ROC) analysis to determine the optimal question length for AI accuracy. All AI responses were generated independently without retaining prior information.

Results: Significant di!erences in AI tool accuracy were observed across di!erent years and subspecialties (P < .001). ChatGPT-4 consistently outperformed other models, achieving the highest overall accuracy (95% in specific subspecialties). Notably, ChatGPT-4 demonstrated superior performance in Basic and General Orthopedics and Foot and Ankle Surgery, while Gemini and ChatGPT-3.5 showed variability in accuracy across topics and years. Receiver operating characteristic analysis revealed a significant relationship between shorter letter counts and higher accuracy for ChatGPT-4 (P=.002). ChatGPT-4 showed significant negative correlations between letter count and accuracy across all years (r="0.099, P=.002), outperformed residents in basic and general orthopedics (P=.015) and trauma (P=.012), unlike other AI models.

Conclusion: The findings underscore the advancing role of AI in the medical field, with ChatGPT-4 demonstrating significant potential as a tool for medical education and clinical decision-making. Continuous evaluation and refinement of AI technologies are essential to enhance their educational and clinical impact.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信