Performance of ChatGPT-4o in thoracic trauma: A comparative evaluation with guidelines.

IF 1

Ulusal travma ve acil cerrahi dergisi = Turkish journal of trauma & emergency surgery : TJTES Pub Date : 2025-09-01 DOI:10.14744/tjtes.2025.47087

İsmail Dal, Mehmet Yildirim

{"title":"Performance of ChatGPT-4o in thoracic trauma: A comparative evaluation with guidelines.","authors":"İsmail Dal, Mehmet Yildirim","doi":"10.14744/tjtes.2025.47087","DOIUrl":null,"url":null,"abstract":"Background: This study aims to evaluate the performance of ChatGPT-4o in thoracic trauma management by comparing its responses to established clinical guidelines.Methods: Five major thoracic surgery guidelines were reviewed, including the Advanced Trauma Life Support (ATLS) Guidelines 2018, Eastern Association for the Surgery of Trauma (EAST) Guidelines 2020, Evaluation and management of traumatic pneumothorax: A Western Trauma Association critical decisions algorithm 2022, European Trauma Course (ETC) Guidelines 2016, and the National Institute for Health and Care Excellence (NICE) Guidelines for Trauma 2020. Fifty open-ended questions were developed based on these guidelines and submitted to ChatGPT-4o. Five thoracic surgery specialists evaluated the artificial intelligence (AI) responses using a 5-point Likert scale.Results: ChatGPT-4o achieved an average score of 4.76+-0.57 on the 50-question evaluation. ChatGPT-4o excelled in questions derived from well-defined guidelines, demonstrating its ability to synthesize and apply guideline-based medical knowledge. Its performance aligns with previous studies in urological trauma and emergency medicine, which reported similar reliability. However, its reliance on pre-existing data limits its effectiveness in addressing highly nuanced or novel clinical scenarios. These findings underscore its potential as a complementary tool in guideline-driven medical contexts while emphasizing the need for clinical oversight in complex cases.Conclusion: ChatGPT-4o performed strongly in thoracic trauma management questions, demonstrating minimal errors and high reliability. These results suggest it could serve as a valuable support tool for clinical decision-making, particularly in scenarios guided by established protocols. Further exploration into broader medical domains is warranted.","PeriodicalId":94263,"journal":{"name":"Ulusal travma ve acil cerrahi dergisi = Turkish journal of trauma & emergency surgery : TJTES","volume":"31 9","pages":"839-846"},"PeriodicalIF":1.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460628/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ulusal travma ve acil cerrahi dergisi = Turkish journal of trauma & emergency surgery : TJTES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14744/tjtes.2025.47087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: This study aims to evaluate the performance of ChatGPT-4o in thoracic trauma management by comparing its responses to established clinical guidelines.

Methods: Five major thoracic surgery guidelines were reviewed, including the Advanced Trauma Life Support (ATLS) Guidelines 2018, Eastern Association for the Surgery of Trauma (EAST) Guidelines 2020, Evaluation and management of traumatic pneumothorax: A Western Trauma Association critical decisions algorithm 2022, European Trauma Course (ETC) Guidelines 2016, and the National Institute for Health and Care Excellence (NICE) Guidelines for Trauma 2020. Fifty open-ended questions were developed based on these guidelines and submitted to ChatGPT-4o. Five thoracic surgery specialists evaluated the artificial intelligence (AI) responses using a 5-point Likert scale.

Results: ChatGPT-4o achieved an average score of 4.76+-0.57 on the 50-question evaluation. ChatGPT-4o excelled in questions derived from well-defined guidelines, demonstrating its ability to synthesize and apply guideline-based medical knowledge. Its performance aligns with previous studies in urological trauma and emergency medicine, which reported similar reliability. However, its reliance on pre-existing data limits its effectiveness in addressing highly nuanced or novel clinical scenarios. These findings underscore its potential as a complementary tool in guideline-driven medical contexts while emphasizing the need for clinical oversight in complex cases.

Conclusion: ChatGPT-4o performed strongly in thoracic trauma management questions, demonstrating minimal errors and high reliability. These results suggest it could serve as a valuable support tool for clinical decision-making, particularly in scenarios guided by established protocols. Further exploration into broader medical domains is warranted.

查看原文本刊更多论文

chatgpt - 40在胸部创伤中的表现：与指南的比较评价。

背景：本研究旨在通过比较chatgpt - 40与现有临床指南的反应来评估其在胸部创伤治疗中的表现。方法：回顾了5项主要的胸外科指南，包括《高级创伤生命支持（ATLS）指南2018》、《东部创伤外科协会指南2020》、《创伤性气胸的评估和管理：西方创伤协会关键决策算法2022》、《欧洲创伤过程（ETC）指南2016》和《国家健康与护理卓越研究所（NICE）创伤指南2020》。50个开放式问题是根据这些指导方针开发的，并提交给chatgpt - 40。五名胸外科专家使用5分李克特量表对人工智能（AI）的反应进行了评估。结果：chatgpt - 40在50个问题的评估中平均得分为4.76+-0.57。chatgpt - 40在来自明确定义的指南的问题上表现出色，展示了其综合和应用基于指南的医学知识的能力。它的性能与先前在泌尿创伤和急诊医学方面的研究一致，这些研究报告了类似的可靠性。然而，它对已有数据的依赖限制了其在解决高度细微差别或新颖临床场景方面的有效性。这些发现强调了它在指南驱动的医学背景下作为补充工具的潜力，同时强调了对复杂病例进行临床监督的必要性。结论：chatgpt - 40在胸部创伤管理问题上表现出色，误差最小，可靠性高。这些结果表明，它可以作为临床决策的有价值的支持工具，特别是在由既定方案指导的情况下。进一步探索更广泛的医学领域是必要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ulusal travma ve acil cerrahi dergisi = Turkish journal of trauma & emergency surgery : TJTES

自引率

0.00%

发文量