评估人工智能解决细微心脏病亚专科问题的能力:ChatGPT和CathSAP

Saumya Nanda MBBS , Khaled Abaza MD , Pyae Hein Kyaw MBBS , Robert Frankel MD , Partha Sardar MD , Sahil A. Parikh MD , Tharun Shyam MBBS , Saurav Chatterjee MD
{"title":"评估人工智能解决细微心脏病亚专科问题的能力:ChatGPT和CathSAP","authors":"Saumya Nanda MBBS ,&nbsp;Khaled Abaza MD ,&nbsp;Pyae Hein Kyaw MBBS ,&nbsp;Robert Frankel MD ,&nbsp;Partha Sardar MD ,&nbsp;Sahil A. Parikh MD ,&nbsp;Tharun Shyam MBBS ,&nbsp;Saurav Chatterjee MD","doi":"10.1016/j.jscai.2025.102563","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Recent developments in artificial intelligence (AI), particularly in large language models, have shown promise in various fields, including health care. However, their performance on specialized medical board examinations, such as interventional cardiology assessments, remains relatively unexplored.</div></div><div><h3>Methods</h3><div>A cross-sectional study was conducted using a data set comprising 360 questions from the Cath Self Assessment Program (CathSAP) question bank. This study aimed to assess the overall performance of Chat Generative Pre-trained Transformer (ChatGPT) and compare it to that of average test takers. Additionally, the study evaluated the impact of pertinent educational materials on ChatGPT’s responses, both before and after exposure. The primary outcome measures included ChatGPT’s overall percentage score on the CathSAP examination and its performance across various subsections. Statistical significance was determined using the Kruskal-Wallis equality-of-populations rank test.</div></div><div><h3>Results</h3><div>Initially, ChatGPT achieved an overall score of 54.44% on the CathSAP exam, which improved significantly to 79.16% after exposure to relevant textual content. The improvement was statistically significant (<em>P</em> = .0003). Notably, the improved score was comparable with the average score achieved by typical test takers (as reported by CathSAP). ChatGPT demonstrated proficiency in sections covering basic science, pharmacology, and miscellaneous topics, although it struggled with anatomy, anatomic variants, and anatomic pathology questions.</div></div><div><h3>Conclusions</h3><div>The study demonstrates ChatGPT’s potential for learning and adapting to medical examination scenarios, with a notable enhancement in performance after exposure to educational materials. However, limitations such as the model’s inability to process certain visual materials and potential biases in AI models warrant further consideration. These findings underscore the need for continued research to optimize the use of AI in medical education and assessment.</div></div>","PeriodicalId":73990,"journal":{"name":"Journal of the Society for Cardiovascular Angiography & Interventions","volume":"4 3","pages":"Article 102563"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Ability of Artificial Intelligence to Address Nuanced Cardiology Subspecialty Questions: ChatGPT and CathSAP\",\"authors\":\"Saumya Nanda MBBS ,&nbsp;Khaled Abaza MD ,&nbsp;Pyae Hein Kyaw MBBS ,&nbsp;Robert Frankel MD ,&nbsp;Partha Sardar MD ,&nbsp;Sahil A. Parikh MD ,&nbsp;Tharun Shyam MBBS ,&nbsp;Saurav Chatterjee MD\",\"doi\":\"10.1016/j.jscai.2025.102563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Recent developments in artificial intelligence (AI), particularly in large language models, have shown promise in various fields, including health care. However, their performance on specialized medical board examinations, such as interventional cardiology assessments, remains relatively unexplored.</div></div><div><h3>Methods</h3><div>A cross-sectional study was conducted using a data set comprising 360 questions from the Cath Self Assessment Program (CathSAP) question bank. This study aimed to assess the overall performance of Chat Generative Pre-trained Transformer (ChatGPT) and compare it to that of average test takers. Additionally, the study evaluated the impact of pertinent educational materials on ChatGPT’s responses, both before and after exposure. The primary outcome measures included ChatGPT’s overall percentage score on the CathSAP examination and its performance across various subsections. Statistical significance was determined using the Kruskal-Wallis equality-of-populations rank test.</div></div><div><h3>Results</h3><div>Initially, ChatGPT achieved an overall score of 54.44% on the CathSAP exam, which improved significantly to 79.16% after exposure to relevant textual content. The improvement was statistically significant (<em>P</em> = .0003). Notably, the improved score was comparable with the average score achieved by typical test takers (as reported by CathSAP). ChatGPT demonstrated proficiency in sections covering basic science, pharmacology, and miscellaneous topics, although it struggled with anatomy, anatomic variants, and anatomic pathology questions.</div></div><div><h3>Conclusions</h3><div>The study demonstrates ChatGPT’s potential for learning and adapting to medical examination scenarios, with a notable enhancement in performance after exposure to educational materials. However, limitations such as the model’s inability to process certain visual materials and potential biases in AI models warrant further consideration. These findings underscore the need for continued research to optimize the use of AI in medical education and assessment.</div></div>\",\"PeriodicalId\":73990,\"journal\":{\"name\":\"Journal of the Society for Cardiovascular Angiography & Interventions\",\"volume\":\"4 3\",\"pages\":\"Article 102563\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Society for Cardiovascular Angiography & Interventions\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772930325000043\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Society for Cardiovascular Angiography & Interventions","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772930325000043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)的最新发展,特别是在大型语言模型方面,已经在包括医疗保健在内的各个领域显示出前景。然而,他们在专业医学委员会检查中的表现,如介入心脏病学评估,仍然相对未被探索。方法采用一项横断面研究,数据集包括来自Cath自我评估计划(CathSAP)题库的360个问题。本研究旨在评估聊天生成预训练转换器(ChatGPT)的整体性能,并将其与普通考生的性能进行比较。此外,该研究还评估了相关教育材料对ChatGPT在接触之前和之后的反应的影响。主要结果测量包括ChatGPT在CathSAP考试中的总体百分比得分及其在各个小节中的表现。统计学显著性采用Kruskal-Wallis人口平等秩检验。最初,ChatGPT在CathSAP考试中获得了54.44%的总分,在接触相关文本内容后,这一成绩显著提高到79.16%。改善有统计学意义(P = .0003)。值得注意的是,改进后的分数与典型考生的平均分数相当(据CathSAP报告)。ChatGPT在涵盖基础科学、药理学和各种主题的章节中表现出了熟练程度,尽管它在解剖学、解剖变异和解剖病理学问题上遇到了困难。结论该研究证明了ChatGPT在学习和适应医学检查场景方面的潜力,在接触教育材料后表现显著提高。然而,模型无法处理某些视觉材料和人工智能模型中的潜在偏差等局限性值得进一步考虑。这些发现强调需要继续进行研究,以优化人工智能在医学教育和评估中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating the Ability of Artificial Intelligence to Address Nuanced Cardiology Subspecialty Questions: ChatGPT and CathSAP

Background

Recent developments in artificial intelligence (AI), particularly in large language models, have shown promise in various fields, including health care. However, their performance on specialized medical board examinations, such as interventional cardiology assessments, remains relatively unexplored.

Methods

A cross-sectional study was conducted using a data set comprising 360 questions from the Cath Self Assessment Program (CathSAP) question bank. This study aimed to assess the overall performance of Chat Generative Pre-trained Transformer (ChatGPT) and compare it to that of average test takers. Additionally, the study evaluated the impact of pertinent educational materials on ChatGPT’s responses, both before and after exposure. The primary outcome measures included ChatGPT’s overall percentage score on the CathSAP examination and its performance across various subsections. Statistical significance was determined using the Kruskal-Wallis equality-of-populations rank test.

Results

Initially, ChatGPT achieved an overall score of 54.44% on the CathSAP exam, which improved significantly to 79.16% after exposure to relevant textual content. The improvement was statistically significant (P = .0003). Notably, the improved score was comparable with the average score achieved by typical test takers (as reported by CathSAP). ChatGPT demonstrated proficiency in sections covering basic science, pharmacology, and miscellaneous topics, although it struggled with anatomy, anatomic variants, and anatomic pathology questions.

Conclusions

The study demonstrates ChatGPT’s potential for learning and adapting to medical examination scenarios, with a notable enhancement in performance after exposure to educational materials. However, limitations such as the model’s inability to process certain visual materials and potential biases in AI models warrant further consideration. These findings underscore the need for continued research to optimize the use of AI in medical education and assessment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.40
自引率
0.00%
发文量
0
审稿时长
48 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信