Artificial intelligence chatbots in transfusion medicine: A cross-sectional study.

IF 1.8 4区医学 Q3 HEMATOLOGY

Vox Sanguinis Pub Date : 2025-03-05 DOI:10.1111/vox.70009

Prateek Srivastava, Ashish Tewari, Arwa Z Al-Riyami

{"title":"Artificial intelligence chatbots in transfusion medicine: A cross-sectional study.","authors":"Prateek Srivastava, Ashish Tewari, Arwa Z Al-Riyami","doi":"10.1111/vox.70009","DOIUrl":null,"url":null,"abstract":"Background and objectives: The recent rise of artificial intelligence (AI) chatbots has attracted many users worldwide. However, expert evaluation is essential before relying on them for transfusion medicine (TM)-related information. This study aims to evaluate the performance of AI chatbots for accuracy, correctness, completeness and safety.Materials and methods: Six AI chatbots (ChatGPT 4, ChatGPT 4-o, Gemini Advanced, Copilot, Anthropic Claude 3.5 Sonnet, Meta AI) were tested using TM-related prompts at two time points, 30 days apart. Their responses were assessed by four TM experts. Evaluators' scores underwent inter-rater reliability testing. Responses from Day 30 were compared with those from Day 1 to evaluate consistency and potential evolution over time.Results: All six chatbots exhibited some level of inconsistency and varying degrees of evolution in their responses over 30 days. None provided entirely correct, complete or safe answers to all questions. Among the chatbots tested, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet demonstrated the highest accuracy and consistency, while Microsoft Copilot and Google Gemini Advanced showed the greatest evolution in their responses. As a limitation, the 30-day period may be too short for a precise assessment of chatbot evolution.Conclusion: At the time of the conduct of this study, none of the AI chatbots provided fully reliable, complete or safe responses to all TM-related prompts. However, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet show the highest promise for future integration into TM practices. Given their variability and ongoing development, AI chatbots should not yet be relied upon as authoritative sources in TM without expert validation.","PeriodicalId":23631,"journal":{"name":"Vox Sanguinis","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vox Sanguinis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/vox.70009","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEMATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objectives: The recent rise of artificial intelligence (AI) chatbots has attracted many users worldwide. However, expert evaluation is essential before relying on them for transfusion medicine (TM)-related information. This study aims to evaluate the performance of AI chatbots for accuracy, correctness, completeness and safety.

Materials and methods: Six AI chatbots (ChatGPT 4, ChatGPT 4-o, Gemini Advanced, Copilot, Anthropic Claude 3.5 Sonnet, Meta AI) were tested using TM-related prompts at two time points, 30 days apart. Their responses were assessed by four TM experts. Evaluators' scores underwent inter-rater reliability testing. Responses from Day 30 were compared with those from Day 1 to evaluate consistency and potential evolution over time.

Results: All six chatbots exhibited some level of inconsistency and varying degrees of evolution in their responses over 30 days. None provided entirely correct, complete or safe answers to all questions. Among the chatbots tested, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet demonstrated the highest accuracy and consistency, while Microsoft Copilot and Google Gemini Advanced showed the greatest evolution in their responses. As a limitation, the 30-day period may be too short for a precise assessment of chatbot evolution.

Conclusion: At the time of the conduct of this study, none of the AI chatbots provided fully reliable, complete or safe responses to all TM-related prompts. However, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet show the highest promise for future integration into TM practices. Given their variability and ongoing development, AI chatbots should not yet be relied upon as authoritative sources in TM without expert validation.

查看原文本刊更多论文

输血医学中的人工智能聊天机器人：横断面研究。

背景和目的：最近，人工智能（AI）聊天机器人的兴起吸引了全球众多用户。然而，在依靠它们获取输血医学（TM）相关信息之前，专家评估是必不可少的。本研究旨在评估人工智能聊天机器人在准确性、正确性、完整性和安全性方面的表现：六个人工智能聊天机器人（ChatGPT 4、ChatGPT 4-o、Gemini Advanced、Copilot、Anthropic Claude 3.5 Sonnet、Meta AI）在两个时间点使用输血医学相关提示进行了测试，时间间隔为 30 天。他们的反应由四位 TM 专家进行评估。评估者的评分经过了评分者之间的可靠性测试。第 30 天的回复与第 1 天的回复进行了比较，以评估随着时间推移的一致性和潜在变化：结果：所有六个聊天机器人在 30 天的回复中都表现出一定程度的不一致性和不同程度的变化。没有一个聊天机器人对所有问题都做出了完全正确、完整或安全的回答。在测试的聊天机器人中，ChatGPT 4-o 和 Anthropic Claude 3.5 Sonnet 的准确性和一致性最高，而 Microsoft Copilot 和 Google Gemini Advanced 的回复变化最大。作为一个局限，30 天的时间可能太短，无法准确评估聊天机器人的演变：结论：在进行本研究时，没有一个人工智能聊天机器人能对所有与 TM 相关的提示做出完全可靠、完整或安全的回复。不过，ChatGPT 4-o 和 Anthropic Claude 3.5 Sonnet 最有希望在未来融入 TM 实践。鉴于人工智能聊天机器人的多变性和不断发展，在没有专家验证的情况下，人工智能聊天机器人还不能作为技术管理的权威来源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Vox Sanguinis 医学-血液学

CiteScore

4.40

自引率

11.10%

发文量

156

审稿时长

6-12 weeks

期刊介绍： Vox Sanguinis reports on important, novel developments in transfusion medicine. Original papers, reviews and international fora are published on all aspects of blood transfusion and tissue transplantation, comprising five main sections: 1) Transfusion - Transmitted Disease and its Prevention: Identification and epidemiology of infectious agents transmissible by blood; Bacterial contamination of blood components; Donor recruitment and selection methods; Pathogen inactivation. 2) Blood Component Collection and Production: Blood collection methods and devices (including apheresis); Plasma fractionation techniques and plasma derivatives; Preparation of labile blood components; Inventory management; Hematopoietic progenitor cell collection and storage; Collection and storage of tissues; Quality management and good manufacturing practice; Automation and information technology. 3) Transfusion Medicine and New Therapies: Transfusion thresholds and audits; Haemovigilance; Clinical trials regarding appropriate haemotherapy; Non-infectious adverse affects of transfusion; Therapeutic apheresis; Support of transplant patients; Gene therapy and immunotherapy. 4) Immunohaematology and Immunogenetics: Autoimmunity in haematology; Alloimmunity of blood; Pre-transfusion testing; Immunodiagnostics; Immunobiology; Complement in immunohaematology; Blood typing reagents; Genetic markers of blood cells and serum proteins: polymorphisms and function; Genetic markers and disease; Parentage testing and forensic immunohaematology. 5) Cellular Therapy: Cell-based therapies; Stem cell sources; Stem cell processing and storage; Stem cell products; Stem cell plasticity; Regenerative medicine with cells; Cellular immunotherapy; Molecular therapy; Gene therapy.