Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives.
Hazem Nasef, Heli Patel, Quratulain Amin, Samuel Baum, Asanthi Ratnasekera, Darwin Ang, William S Havron, Don Nakayama, Adel Elkbuli
{"title":"Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives.","authors":"Hazem Nasef, Heli Patel, Quratulain Amin, Samuel Baum, Asanthi Ratnasekera, Darwin Ang, William S Havron, Don Nakayama, Adel Elkbuli","doi":"10.1177/00031348241256075","DOIUrl":null,"url":null,"abstract":"<p><p>BackgroundThis study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons.MethodsAn anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests.ResultsParticipating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (±.535) vs 2.00 (±1.16), <i>P</i> = .025) (4.14 (±.69) vs 2.43 (±.98), <i>P</i> < .001) and valid (3.71 (±.488) vs 2.86 (±1.07), <i>P</i> = .045) (3.71 (±.76) vs 2.71 (±.95) <i>P</i> = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, <i>P</i> = .289) (3.57 vs 2.71, <i>P</i> = .111).ConclusionSurveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.</p>","PeriodicalId":7782,"journal":{"name":"American Surgeon","volume":" ","pages":"325-335"},"PeriodicalIF":1.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Surgeon","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00031348241256075","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/25 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
BackgroundThis study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons.MethodsAn anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests.ResultsParticipating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (±.535) vs 2.00 (±1.16), P = .025) (4.14 (±.69) vs 2.43 (±.98), P < .001) and valid (3.71 (±.488) vs 2.86 (±1.07), P = .045) (3.71 (±.76) vs 2.71 (±.95) P = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, P = .289) (3.57 vs 2.71, P = .111).ConclusionSurveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.
期刊介绍:
The American Surgeon is a monthly peer-reviewed publication published by the Southeastern Surgical Congress. Its area of concentration is clinical general surgery, as defined by the content areas of the American Board of Surgery: alimentary tract (including bariatric surgery), abdomen and its contents, breast, skin and soft tissue, endocrine system, solid organ transplantation, pediatric surgery, surgical critical care, surgical oncology (including head and neck surgery), trauma and emergency surgery, and vascular surgery.