Artificial Intelligence in Hand and Upper Extremity Surgery Education: Accuracy and Validity of ChatGPT-4o Versus UpToDate as a Learning Tool for Trainees.

Eplasty Pub Date : 2025-05-14 eCollection Date: 2025-01-01

Caleb Bercu, Brianna Rosner, Aneeq Chaudhry, Hannah Korah, Isabel Bernal, Aaron Berger

{"title":"Artificial Intelligence in Hand and Upper Extremity Surgery Education: Accuracy and Validity of ChatGPT-4o Versus UpToDate as a Learning Tool for Trainees.","authors":"Caleb Bercu, Brianna Rosner, Aneeq Chaudhry, Hannah Korah, Isabel Bernal, Aaron Berger","doi":"","DOIUrl":null,"url":null,"abstract":"Background: The use of artificial intelligence (AI) in medical education has risen rapidly. Trainees can ask ChatGPT-4o (OpenAI) clinical questions and receive management recommendations. Previous studies have assessed the accuracy of ChatGPT, but none have examined hand and upper extremity surgery. This study aimed to evaluate the accuracy of ChatGPT-4o compared to UpToDate (Wolters Kluwer) and categorize the validity of sources provided by ChatGPT-4o.Methods: Five hand and upper extremity surgery cases were entered into ChatGPT-4o. An UpToDate article was selected for each case. Two hand surgeons and 5 medical students completed a survey comparing the resources. Resources were rated on a scale from 1 to 3, with 1 indicating incomplete information and not useful; 2 indicating semi-complete information and somewhat useful; and 3 indicating a complete answer and useful for management. ChatGPT-4o references were scored on a validity scale of 0 to 2.Results: Hand and upper extremity surgeons rated ChatGPT-4o and UpToDate as semi-complete and somewhat useful, with median scores of 2.00 and 2.50, respectively. No significant differences were found between resources. Medical students found ChatGPT to provide semi-complete information and be somewhat useful overall, and rated UpToDate more often as providing a complete answer and being useful. However, no statistically significant differences were found between the resource ratings. Of the 25 references provided by ChatGPT, 28% were accurate, 6% were somewhat accurate, and 66% were inaccurate.Conclusions: The findings indicate overall comparable perceived usefulness of ChatGPT-4o and UpToDate by hand/upper extremity surgeons and trainees. ChatGPT-4o holds promise as an educational tool; however, accuracy concerns remain.","PeriodicalId":93993,"journal":{"name":"Eplasty","volume":"25 ","pages":"e17"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257968/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eplasty","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The use of artificial intelligence (AI) in medical education has risen rapidly. Trainees can ask ChatGPT-4o (OpenAI) clinical questions and receive management recommendations. Previous studies have assessed the accuracy of ChatGPT, but none have examined hand and upper extremity surgery. This study aimed to evaluate the accuracy of ChatGPT-4o compared to UpToDate (Wolters Kluwer) and categorize the validity of sources provided by ChatGPT-4o.

Methods: Five hand and upper extremity surgery cases were entered into ChatGPT-4o. An UpToDate article was selected for each case. Two hand surgeons and 5 medical students completed a survey comparing the resources. Resources were rated on a scale from 1 to 3, with 1 indicating incomplete information and not useful; 2 indicating semi-complete information and somewhat useful; and 3 indicating a complete answer and useful for management. ChatGPT-4o references were scored on a validity scale of 0 to 2.

Results: Hand and upper extremity surgeons rated ChatGPT-4o and UpToDate as semi-complete and somewhat useful, with median scores of 2.00 and 2.50, respectively. No significant differences were found between resources. Medical students found ChatGPT to provide semi-complete information and be somewhat useful overall, and rated UpToDate more often as providing a complete answer and being useful. However, no statistically significant differences were found between the resource ratings. Of the 25 references provided by ChatGPT, 28% were accurate, 6% were somewhat accurate, and 66% were inaccurate.

Conclusions: The findings indicate overall comparable perceived usefulness of ChatGPT-4o and UpToDate by hand/upper extremity surgeons and trainees. ChatGPT-4o holds promise as an educational tool; however, accuracy concerns remain.

本刊更多论文

手部和上肢手术教育中的人工智能：chatgpt - 40作为学员学习工具的准确性和有效性对比

背景：人工智能（AI）在医学教育中的应用迅速上升。学员可以询问chatgpt - 40 （OpenAI）临床问题并获得管理建议。以前的研究已经评估了ChatGPT的准确性，但没有一个研究涉及手部和上肢手术。本研究旨在评估chatgpt - 40与UpToDate （Wolters Kluwer）相比的准确性，并对chatgpt - 40提供的来源的有效性进行分类。方法：将5例手部及上肢手术病例纳入chatgpt - 40。为每个案例选择了一篇UpToDate文章。两名手外科医生和5名医学生完成了一项比较资源的调查。资源的等级从1到3,1表示信息不完整，没有用处；2表示信息半完整，有一定用处；3表示一个完整的答案，对管理有用。chatgpt - 40参考文献的效度评分为0至2。结果：手部和上肢外科医生将chatgpt - 40和UpToDate评为半完整和有一定用处，中位评分分别为2.00和2.50。资源间无显著差异。医学生发现ChatGPT提供了半完整的信息，总体上还是有用的，而对UpToDate的评价更多的是提供了完整的答案和有用的。然而，在资源评级之间没有发现统计学上的显著差异。ChatGPT提供的25篇参考文献中，28%准确，6%略准确，66%不准确。结论：研究结果表明，手/上肢外科医生和受训者对chatgpt - 40和UpToDate的总体感知有效性具有可比性。chatgpt - 40有望成为一种教育工具；然而，准确性问题仍然存在。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Eplasty

自引率

0.00%

发文量