Jamie Rice MBBCh, BAO, BA, MSc , Eoin Ó’Briain MBBCh, BAO, BA, MSc , Conor J. Kilkenny MB, BAO, BA, MCh, MRCS , Richard E. Hogan MB, BAO, BA, MCh, MRCS , Tom V. McIntyre MBBCh, BAO, BA, MSc, PGDip, MRCS , Dara Kavanagh MD, FRCSI , Paul C. Neary MD, FRCSI , James M. O’Riordan MD, FRCSI , Shaheel M. Sahebally MD, FRCSI
{"title":"Assessing Artificial Intelligence as a Diagnostic Support Tool for Surgical Admissions in the Emergency Department","authors":"Jamie Rice MBBCh, BAO, BA, MSc , Eoin Ó’Briain MBBCh, BAO, BA, MSc , Conor J. Kilkenny MB, BAO, BA, MCh, MRCS , Richard E. Hogan MB, BAO, BA, MCh, MRCS , Tom V. McIntyre MBBCh, BAO, BA, MSc, PGDip, MRCS , Dara Kavanagh MD, FRCSI , Paul C. Neary MD, FRCSI , James M. O’Riordan MD, FRCSI , Shaheel M. Sahebally MD, FRCSI","doi":"10.1016/j.jsurg.2025.103676","DOIUrl":null,"url":null,"abstract":"<div><h3>INTRODUCTION</h3><div>Artificial intelligence (AI) is increasingly being used in healthcare for data analysis and decision support. This study assesses the potential of AI (specifically ChatGPT-4o) as a diagnostic support tool for surgical admissions in the Emergency Department and compared its accuracy with that of on-call surgical trainees.</div></div><div><h3>METHODS</h3><div>This was a single-institution retrospective study conducted in December 2023. Primary outcomes comprised the agreement and accuracy of diagnoses between AI and trainees. The secondary outcome measure was the similarity of treatment plans. Agreement was defined as the percentage of cases in which both AI and the surgical trainee concurred on the diagnosis or specific aspects of the management plan. Accuracy was defined in both groups as the proportion of provisional diagnoses that corresponded with the definitive diagnosis, as confirmed by subsequent imaging and/or clinical documentation.</div></div><div><h3>RESULTS</h3><div>One hundred patients were included in the study. The mean age of presenting cases was 54 (±20), years. Abdominal pain was the most commonly reported symptom in provisional diagnoses (68%). The accuracy of provisional diagnoses compared to CT-confirmed diagnoses was 76% for the AI system and 74% for the surgical trainees (p = 0.744). Substantial agreement between the AI and trainee’s initial diagnosis was observed (κ = 0.73). There were no statistically significant differences between AI & surgical trainee in provisional diagnosis (p = 0.754), decisions to bring patients to the operating theatre (p = 0.540) and antibiotic administration (p = 0.122). The overall agreement rates were 85%, 58% & 74% respectively.</div></div><div><h3>CONCLUSION</h3><div>Commercially-available AI models demonstrate similar diagnostic ability to junior surgical trainees and may serve as a useful decision-support system. These models could be incorporated into electronic record systems as an adjunct to enhance decision-making.</div></div>","PeriodicalId":50033,"journal":{"name":"Journal of Surgical Education","volume":"82 10","pages":"Article 103676"},"PeriodicalIF":2.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Education","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1931720425002570","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
INTRODUCTION
Artificial intelligence (AI) is increasingly being used in healthcare for data analysis and decision support. This study assesses the potential of AI (specifically ChatGPT-4o) as a diagnostic support tool for surgical admissions in the Emergency Department and compared its accuracy with that of on-call surgical trainees.
METHODS
This was a single-institution retrospective study conducted in December 2023. Primary outcomes comprised the agreement and accuracy of diagnoses between AI and trainees. The secondary outcome measure was the similarity of treatment plans. Agreement was defined as the percentage of cases in which both AI and the surgical trainee concurred on the diagnosis or specific aspects of the management plan. Accuracy was defined in both groups as the proportion of provisional diagnoses that corresponded with the definitive diagnosis, as confirmed by subsequent imaging and/or clinical documentation.
RESULTS
One hundred patients were included in the study. The mean age of presenting cases was 54 (±20), years. Abdominal pain was the most commonly reported symptom in provisional diagnoses (68%). The accuracy of provisional diagnoses compared to CT-confirmed diagnoses was 76% for the AI system and 74% for the surgical trainees (p = 0.744). Substantial agreement between the AI and trainee’s initial diagnosis was observed (κ = 0.73). There were no statistically significant differences between AI & surgical trainee in provisional diagnosis (p = 0.754), decisions to bring patients to the operating theatre (p = 0.540) and antibiotic administration (p = 0.122). The overall agreement rates were 85%, 58% & 74% respectively.
CONCLUSION
Commercially-available AI models demonstrate similar diagnostic ability to junior surgical trainees and may serve as a useful decision-support system. These models could be incorporated into electronic record systems as an adjunct to enhance decision-making.
期刊介绍:
The Journal of Surgical Education (JSE) is dedicated to advancing the field of surgical education through original research. The journal publishes research articles in all surgical disciplines on topics relative to the education of surgical students, residents, and fellows, as well as practicing surgeons. Our readers look to JSE for timely, innovative research findings from the international surgical education community. As the official journal of the Association of Program Directors in Surgery (APDS), JSE publishes the proceedings of the annual APDS meeting held during Surgery Education Week.