Comparative analysis of large language models and clinician responses in patient blood management knowledge.

IF 2.8 3区 医学 Q1 ANESTHESIOLOGY
Felix Tran, Patrick Meybohm, Lea V Blum, Vanessa Neef, Jan A Kloka, Florian Rumpf, Tobias E Haas, Sebastian Hottenrott, Philipp Helmer, Peter Kranke, Benedikt Schmid, Denana Mehic, Kai Zacharowski, Suma Choorapoikayil
{"title":"Comparative analysis of large language models and clinician responses in patient blood management knowledge.","authors":"Felix Tran, Patrick Meybohm, Lea V Blum, Vanessa Neef, Jan A Kloka, Florian Rumpf, Tobias E Haas, Sebastian Hottenrott, Philipp Helmer, Peter Kranke, Benedikt Schmid, Denana Mehic, Kai Zacharowski, Suma Choorapoikayil","doi":"10.23736/S0375-9393.25.19014-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) are increasingly used in the medical field and have the potential to reduce workload and improve treatment procedures in clinical practice. This study evaluates the capabilities of LLMs to answer common questions related to patient blood management (PBM) and compares their performance to the expertise of clinicians from two university hospitals.</p><p><strong>Methods: </strong>To evaluate the performance of ChatGPT-3.5, ChatGPT-4o, and Google Gemini in answering PBM-related questions, we used a representative sample of 40 questions (30 single-choice and 10 frequently asked patient questions) and compared their responses to those of clinicians. The accuracy and interrater reliability of the answers were analyzed.</p><p><strong>Results: </strong>For PBM knowledge-based questions, the proportion of correct answers was 96.4% (95% CI: 93.6-98.0%) for ChatGPT-4o, 81.3% (95% CI: 77.0-85.7%) for ChatGPT-3.5, and 84.0% (95% CI: 79.4-87.7%) for Google Gemini. Clinicians (N.=82) provided correct answers to 76.5% (95% CI: 74.7-78.1%) of the questions. For frequently asked patient questions, the proportion of correct answers was 100% for ChatGPT-4o, 95.5% (95% CI: 91.4-99.6%) for ChatGPT-3.5 and 91.7% (95% CI: 86.0-97.4%) for Google Gemini. Clinicians provided correct answers to 62.0% (95% CI: 58.7-65.3%) of the questions. Across the categories -anemia management, iron supplementation, cell salvage, principles of PBM, and blood transfusion- ChatGPT-4o achieved the highest scores, providing the most correct answers.</p><p><strong>Conclusions: </strong>LLMs show strong potential for delivering accurate and comprehensive responses to common PBM-related questions. However, it remains essential for clinicians and patients to verify responses, particularly in critical situations.</p>","PeriodicalId":18522,"journal":{"name":"Minerva anestesiologica","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Minerva anestesiologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.23736/S0375-9393.25.19014-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language models (LLMs) are increasingly used in the medical field and have the potential to reduce workload and improve treatment procedures in clinical practice. This study evaluates the capabilities of LLMs to answer common questions related to patient blood management (PBM) and compares their performance to the expertise of clinicians from two university hospitals.

Methods: To evaluate the performance of ChatGPT-3.5, ChatGPT-4o, and Google Gemini in answering PBM-related questions, we used a representative sample of 40 questions (30 single-choice and 10 frequently asked patient questions) and compared their responses to those of clinicians. The accuracy and interrater reliability of the answers were analyzed.

Results: For PBM knowledge-based questions, the proportion of correct answers was 96.4% (95% CI: 93.6-98.0%) for ChatGPT-4o, 81.3% (95% CI: 77.0-85.7%) for ChatGPT-3.5, and 84.0% (95% CI: 79.4-87.7%) for Google Gemini. Clinicians (N.=82) provided correct answers to 76.5% (95% CI: 74.7-78.1%) of the questions. For frequently asked patient questions, the proportion of correct answers was 100% for ChatGPT-4o, 95.5% (95% CI: 91.4-99.6%) for ChatGPT-3.5 and 91.7% (95% CI: 86.0-97.4%) for Google Gemini. Clinicians provided correct answers to 62.0% (95% CI: 58.7-65.3%) of the questions. Across the categories -anemia management, iron supplementation, cell salvage, principles of PBM, and blood transfusion- ChatGPT-4o achieved the highest scores, providing the most correct answers.

Conclusions: LLMs show strong potential for delivering accurate and comprehensive responses to common PBM-related questions. However, it remains essential for clinicians and patients to verify responses, particularly in critical situations.

大型语言模型与临床医生对患者血液管理知识反应的比较分析。
背景:大型语言模型(LLMs)越来越多地应用于医学领域,在临床实践中具有减少工作量和改善治疗程序的潜力。本研究评估了法学硕士回答与患者血液管理(PBM)相关的常见问题的能力,并将其表现与两所大学医院的临床医生的专业知识进行了比较。方法:为了评估ChatGPT-3.5、chatgpt - 40和谷歌Gemini在回答pbm相关问题方面的表现,我们使用了40个问题的代表性样本(30个单项选择和10个常见的患者问题),并将他们的回答与临床医生的回答进行了比较。分析了答案的准确性和解释器信度。结果:对于PBM基于知识的问题,chatgpt - 40的正确率为96.4% (95% CI: 93.6-98.0%), ChatGPT-3.5的正确率为81.3% (95% CI: 77.0-85.7%),谷歌Gemini的正确率为84.0% (95% CI: 79.4-87.7%)。临床医生(n =82)对76.5% (95% CI: 74.7-78.1%)的问题给出了正确答案。对于常见的患者问题,chatgpt - 40的正确率为100%,ChatGPT-3.5的正确率为95.5% (95% CI: 91.4-99.6%),谷歌Gemini的正确率为91.7% (95% CI: 86.0-97.4%)。临床医生对62.0% (95% CI: 58.7-65.3%)的问题给出了正确答案。在贫血管理、补铁、细胞挽救、PBM原则和输血等类别中,chatgpt - 40得分最高,提供了最多的正确答案。结论:法学硕士显示出对常见pbm相关问题提供准确和全面回答的强大潜力。然而,临床医生和患者核实应对措施仍然至关重要,特别是在危急情况下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Minerva anestesiologica
Minerva anestesiologica 医学-麻醉学
CiteScore
4.50
自引率
21.90%
发文量
367
审稿时长
4-8 weeks
期刊介绍: Minerva Anestesiologica is the journal of the Italian National Society of Anaesthesia, Analgesia, Resuscitation, and Intensive Care. Minerva Anestesiologica publishes scientific papers on Anesthesiology, Intensive care, Analgesia, Perioperative Medicine and related fields. Manuscripts are expected to comply with the instructions to authors which conform to the Uniform Requirements for Manuscripts Submitted to Biomedical Editors by the International Committee of Medical Journal Editors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信