{"title":"Evaluating the Ability of Artificial Intelligence to Address Nuanced Cardiology Subspecialty Questions: ChatGPT and CathSAP","authors":"Saumya Nanda MBBS , Khaled Abaza MD , Pyae Hein Kyaw MBBS , Robert Frankel MD , Partha Sardar MD , Sahil A. Parikh MD , Tharun Shyam MBBS , Saurav Chatterjee MD","doi":"10.1016/j.jscai.2025.102563","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Recent developments in artificial intelligence (AI), particularly in large language models, have shown promise in various fields, including health care. However, their performance on specialized medical board examinations, such as interventional cardiology assessments, remains relatively unexplored.</div></div><div><h3>Methods</h3><div>A cross-sectional study was conducted using a data set comprising 360 questions from the Cath Self Assessment Program (CathSAP) question bank. This study aimed to assess the overall performance of Chat Generative Pre-trained Transformer (ChatGPT) and compare it to that of average test takers. Additionally, the study evaluated the impact of pertinent educational materials on ChatGPT’s responses, both before and after exposure. The primary outcome measures included ChatGPT’s overall percentage score on the CathSAP examination and its performance across various subsections. Statistical significance was determined using the Kruskal-Wallis equality-of-populations rank test.</div></div><div><h3>Results</h3><div>Initially, ChatGPT achieved an overall score of 54.44% on the CathSAP exam, which improved significantly to 79.16% after exposure to relevant textual content. The improvement was statistically significant (<em>P</em> = .0003). Notably, the improved score was comparable with the average score achieved by typical test takers (as reported by CathSAP). ChatGPT demonstrated proficiency in sections covering basic science, pharmacology, and miscellaneous topics, although it struggled with anatomy, anatomic variants, and anatomic pathology questions.</div></div><div><h3>Conclusions</h3><div>The study demonstrates ChatGPT’s potential for learning and adapting to medical examination scenarios, with a notable enhancement in performance after exposure to educational materials. However, limitations such as the model’s inability to process certain visual materials and potential biases in AI models warrant further consideration. These findings underscore the need for continued research to optimize the use of AI in medical education and assessment.</div></div>","PeriodicalId":73990,"journal":{"name":"Journal of the Society for Cardiovascular Angiography & Interventions","volume":"4 3","pages":"Article 102563"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Society for Cardiovascular Angiography & Interventions","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772930325000043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Recent developments in artificial intelligence (AI), particularly in large language models, have shown promise in various fields, including health care. However, their performance on specialized medical board examinations, such as interventional cardiology assessments, remains relatively unexplored.
Methods
A cross-sectional study was conducted using a data set comprising 360 questions from the Cath Self Assessment Program (CathSAP) question bank. This study aimed to assess the overall performance of Chat Generative Pre-trained Transformer (ChatGPT) and compare it to that of average test takers. Additionally, the study evaluated the impact of pertinent educational materials on ChatGPT’s responses, both before and after exposure. The primary outcome measures included ChatGPT’s overall percentage score on the CathSAP examination and its performance across various subsections. Statistical significance was determined using the Kruskal-Wallis equality-of-populations rank test.
Results
Initially, ChatGPT achieved an overall score of 54.44% on the CathSAP exam, which improved significantly to 79.16% after exposure to relevant textual content. The improvement was statistically significant (P = .0003). Notably, the improved score was comparable with the average score achieved by typical test takers (as reported by CathSAP). ChatGPT demonstrated proficiency in sections covering basic science, pharmacology, and miscellaneous topics, although it struggled with anatomy, anatomic variants, and anatomic pathology questions.
Conclusions
The study demonstrates ChatGPT’s potential for learning and adapting to medical examination scenarios, with a notable enhancement in performance after exposure to educational materials. However, limitations such as the model’s inability to process certain visual materials and potential biases in AI models warrant further consideration. These findings underscore the need for continued research to optimize the use of AI in medical education and assessment.