Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives.

IF 1 4区医学 Q3 SURGERY

American Surgeon Pub Date : 2025-03-01 Epub Date: 2024-05-25 DOI:10.1177/00031348241256075

Hazem Nasef, Heli Patel, Quratulain Amin, Samuel Baum, Asanthi Ratnasekera, Darwin Ang, William S Havron, Don Nakayama, Adel Elkbuli

{"title":"Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives.","authors":"Hazem Nasef, Heli Patel, Quratulain Amin, Samuel Baum, Asanthi Ratnasekera, Darwin Ang, William S Havron, Don Nakayama, Adel Elkbuli","doi":"10.1177/00031348241256075","DOIUrl":null,"url":null,"abstract":"BackgroundThis study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons.MethodsAn anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests.ResultsParticipating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (±.535) vs 2.00 (±1.16), P = .025) (4.14 (±.69) vs 2.43 (±.98), P < .001) and valid (3.71 (±.488) vs 2.86 (±1.07), P = .045) (3.71 (±.76) vs 2.71 (±.95) P = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, P = .289) (3.57 vs 2.71, P = .111).ConclusionSurveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.","PeriodicalId":7782,"journal":{"name":"American Surgeon","volume":" ","pages":"325-335"},"PeriodicalIF":1.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Surgeon","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00031348241256075","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/25 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

BackgroundThis study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons.MethodsAn anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests.ResultsParticipating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (±.535) vs 2.00 (±1.16), P = .025) (4.14 (±.69) vs 2.43 (±.98), P < .001) and valid (3.71 (±.488) vs 2.86 (±1.07), P = .045) (3.71 (±.76) vs 2.71 (±.95) P = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, P = .289) (3.57 vs 2.71, P = .111).ConclusionSurveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.

查看原文本刊更多论文

评估 ChatGPT 与循证来源相比在常见外科疾病方面的准确性、全面性和有效性：外科医生的观点。

背景：本研究旨在通过调查美国委员会认证的执业外科医生对 ChatGPT 的看法，评估 ChatGPT 与循证来源相比在诊断和管理常见外科疾病方面的准确性、全面性和有效性：方法：2023 年 6 月至 2024 年 3 月期间，向美国执业外科医生发放了匿名横断面调查问卷。调查包括 94 道多项选择题，评估来自循证医学资料来源或由 ChatGPT 生成的五种常见外科疾病的诊断和管理信息。统计分析包括描述性统计和配对样本 t 检验：参与调查的外科医生主要为 40-50 岁（43%）、男性（86%）、白人（57%），拥有 5-10 年或超过 15 年的工作经验（86%）。大多数外科医生在外科实践中没有使用 ChatGPT 的经验（86%）。对于讨论急性胆囊炎和上消化道出血的资料，循证来源的评分明显更全面（3.57 (±.535) vs 2.00（±1.16），P = .025）（4.14（±.69）vs 2.43（±.98），P < .001）和有效性（3.71（±.488）vs 2.86（±1.07），P = .045）（3.71（±.76）vs 2.71（±.95）P = .038）明显高于 ChatGPT。然而，两种来源的准确性没有明显差异（3.71 vs 3.29，P = .289）（3.57 vs 2.71，P = .111）：结论：接受调查的美国委员会认证的执业外科医生认为，在大多数调查的外科疾病中，循证医学资料的全面性和有效性明显高于 ChatGPT。然而，在大多数调查病症中，两者的准确性并无明显差异。虽然 ChatGPT 可为外科实践带来潜在益处，但仍需进一步完善和验证，以提高其实用性和外科医生的接受度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American Surgeon 医学-外科

CiteScore

1.40

自引率

0.00%

发文量

623

期刊介绍： The American Surgeon is a monthly peer-reviewed publication published by the Southeastern Surgical Congress. Its area of concentration is clinical general surgery, as defined by the content areas of the American Board of Surgery: alimentary tract (including bariatric surgery), abdomen and its contents, breast, skin and soft tissue, endocrine system, solid organ transplantation, pediatric surgery, surgical critical care, surgical oncology (including head and neck surgery), trauma and emergency surgery, and vascular surgery.