ChatGPT and American Society of Anesthesiologists (ASA) classifications - utilizing artificial intelligence in ASA classification of pediatric surgical patients
Chaitanya Challa , Abdulla Ahmed , Giuliana Geng-Ramos , Jennica Luu , Sohel Rana , Jessica A. Cronin
{"title":"ChatGPT and American Society of Anesthesiologists (ASA) classifications - utilizing artificial intelligence in ASA classification of pediatric surgical patients","authors":"Chaitanya Challa , Abdulla Ahmed , Giuliana Geng-Ramos , Jennica Luu , Sohel Rana , Jessica A. Cronin","doi":"10.1016/j.pcorm.2025.100547","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The American Society of Anesthesiologists (ASA) physical status classification system is a widely used tool to assess preoperative risk. However, variability in assigning ASA scores due to subjectivity among healthcare workers remains an issue. Advances in artificial intelligence (AI) present an opportunity to improve the consistency of ASA classifications. The aim of this study was to evaluate the potential of ChatGPT, a large language model (LLM), to assign ASA scores in pediatric surgical patients. The authors hypothesized that ChatGPT's classifications would correlate with anesthesiologist-determined ASA scores.</div></div><div><h3>Methods</h3><div>This retrospective cross-sectional pilot study was conducted at a tertiary pediatric hospital, including 203 pediatric patients who underwent surgery between June 4–7, 2023. Summaries of each patient's medical history and surgery details were created and reviewed by a board-certified anesthesiologist. These summaries were presented to both a study anesthesiologist and entered into ChatGPT (x2) for ASA classification. The ASA classifications by ChatGPT were compared to those provided by both the study anesthesiologist and the day-of-surgery (DOS) anesthesiologist. Cohen's kappa with linear weighting was used to assess inter-rater agreement between ChatGPT and anesthesiologists and to measure intra-rater reliability between different ChatGPT outputs.</div></div><div><h3>Results</h3><div>A total of 203 pediatric cases were analyzed. The agreement between repeated ASA classifications from ChatGPT was significant (κ=0.61, 95% CI 0.52–0.69) with 66% exact match in classifications. The agreement between the first ChatGPT output and the study anesthesiologist showed statistical agreement (κ=0.60, 95% CI 0.51–0.69), with a 66% match. Similarly, the second ChatGPT output had agreement with the study anesthesiologist (κ=0.59, 95% CI 0.50–0.68), with a 67% match. The highest agreement (κ=0.72, 95% CI 0.62–0.81) was observed between the DOS anesthesiologist and the study anesthesiologist, with a 75% match.</div></div><div><h3>Conclusions</h3><div>The correlation between ChatGPT's ASA scores and those assigned by the pilot study anesthesiologist was found to be 66–67%. These findings indicate that AI has the potential to support pediatric anesthesiologists in determining patient ASA classifications.</div></div>","PeriodicalId":53468,"journal":{"name":"Perioperative Care and Operating Room Management","volume":"40 ","pages":"Article 100547"},"PeriodicalIF":1.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Perioperative Care and Operating Room Management","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405603025000883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Nursing","Score":null,"Total":0}
引用次数: 0
Abstract
Background
The American Society of Anesthesiologists (ASA) physical status classification system is a widely used tool to assess preoperative risk. However, variability in assigning ASA scores due to subjectivity among healthcare workers remains an issue. Advances in artificial intelligence (AI) present an opportunity to improve the consistency of ASA classifications. The aim of this study was to evaluate the potential of ChatGPT, a large language model (LLM), to assign ASA scores in pediatric surgical patients. The authors hypothesized that ChatGPT's classifications would correlate with anesthesiologist-determined ASA scores.
Methods
This retrospective cross-sectional pilot study was conducted at a tertiary pediatric hospital, including 203 pediatric patients who underwent surgery between June 4–7, 2023. Summaries of each patient's medical history and surgery details were created and reviewed by a board-certified anesthesiologist. These summaries were presented to both a study anesthesiologist and entered into ChatGPT (x2) for ASA classification. The ASA classifications by ChatGPT were compared to those provided by both the study anesthesiologist and the day-of-surgery (DOS) anesthesiologist. Cohen's kappa with linear weighting was used to assess inter-rater agreement between ChatGPT and anesthesiologists and to measure intra-rater reliability between different ChatGPT outputs.
Results
A total of 203 pediatric cases were analyzed. The agreement between repeated ASA classifications from ChatGPT was significant (κ=0.61, 95% CI 0.52–0.69) with 66% exact match in classifications. The agreement between the first ChatGPT output and the study anesthesiologist showed statistical agreement (κ=0.60, 95% CI 0.51–0.69), with a 66% match. Similarly, the second ChatGPT output had agreement with the study anesthesiologist (κ=0.59, 95% CI 0.50–0.68), with a 67% match. The highest agreement (κ=0.72, 95% CI 0.62–0.81) was observed between the DOS anesthesiologist and the study anesthesiologist, with a 75% match.
Conclusions
The correlation between ChatGPT's ASA scores and those assigned by the pilot study anesthesiologist was found to be 66–67%. These findings indicate that AI has the potential to support pediatric anesthesiologists in determining patient ASA classifications.
美国麻醉医师协会(ASA)的身体状态分类系统是一种广泛使用的评估术前风险的工具。然而,由于医疗工作者的主观性,分配ASA分数的可变性仍然是一个问题。人工智能(AI)的进步为提高ASA分类的一致性提供了机会。本研究的目的是评估ChatGPT(一个大型语言模型(LLM))在儿科外科患者中分配ASA评分的潜力。作者假设ChatGPT的分类与麻醉师确定的ASA分数相关。方法本回顾性横断面初步研究在一家三级儿科医院进行,包括203例于2023年6月4日至7日接受手术的儿童患者。每位患者的病史和手术细节的摘要由委员会认证的麻醉师创建和审查。这些总结提交给一名研究麻醉师,并输入ChatGPT (x2)进行ASA分类。将ChatGPT提供的ASA分类与研究麻醉师和手术当日麻醉师提供的分类进行比较。Cohen's kappa与线性加权用于评估ChatGPT和麻醉师之间的内部一致性,并测量不同ChatGPT输出之间的内部可靠性。结果共分析203例患儿。ChatGPT中重复ASA分类之间的一致性显著(κ=0.61, 95% CI 0.52-0.69),分类之间的精确匹配率为66%。第一次ChatGPT输出与研究麻醉师之间的一致性显示统计一致性(κ=0.60, 95% CI 0.51-0.69),匹配度为66%。同样,第二次ChatGPT输出与研究麻醉师一致(κ=0.59, 95% CI 0.50-0.68),匹配度为67%。在DOS麻醉师和研究麻醉师之间观察到最高的一致性(κ=0.72, 95% CI 0.62-0.81),匹配度为75%。结论ChatGPT的ASA评分与初步研究麻醉师分配的评分之间的相关性为66-67%。这些发现表明,人工智能有可能支持儿科麻醉师确定患者ASA分类。
期刊介绍:
The objective of this new online journal is to serve as a multidisciplinary, peer-reviewed source of information related to the administrative, economic, operational, safety, and quality aspects of the ambulatory and in-patient operating room and interventional procedural processes. The journal will provide high-quality information and research findings on operational and system-based approaches to ensure safe, coordinated, and high-value periprocedural care. With the current focus on value in health care it is essential that there is a venue for researchers to publish articles on quality improvement process initiatives, process flow modeling, information management, efficient design, cost improvement, use of novel technologies, and management.