Prateek Srivastava, Ashish Tewari, Arwa Z Al-Riyami
{"title":"Artificial intelligence chatbots in transfusion medicine: A cross-sectional study.","authors":"Prateek Srivastava, Ashish Tewari, Arwa Z Al-Riyami","doi":"10.1111/vox.70009","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>The recent rise of artificial intelligence (AI) chatbots has attracted many users worldwide. However, expert evaluation is essential before relying on them for transfusion medicine (TM)-related information. This study aims to evaluate the performance of AI chatbots for accuracy, correctness, completeness and safety.</p><p><strong>Materials and methods: </strong>Six AI chatbots (ChatGPT 4, ChatGPT 4-o, Gemini Advanced, Copilot, Anthropic Claude 3.5 Sonnet, Meta AI) were tested using TM-related prompts at two time points, 30 days apart. Their responses were assessed by four TM experts. Evaluators' scores underwent inter-rater reliability testing. Responses from Day 30 were compared with those from Day 1 to evaluate consistency and potential evolution over time.</p><p><strong>Results: </strong>All six chatbots exhibited some level of inconsistency and varying degrees of evolution in their responses over 30 days. None provided entirely correct, complete or safe answers to all questions. Among the chatbots tested, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet demonstrated the highest accuracy and consistency, while Microsoft Copilot and Google Gemini Advanced showed the greatest evolution in their responses. As a limitation, the 30-day period may be too short for a precise assessment of chatbot evolution.</p><p><strong>Conclusion: </strong>At the time of the conduct of this study, none of the AI chatbots provided fully reliable, complete or safe responses to all TM-related prompts. However, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet show the highest promise for future integration into TM practices. Given their variability and ongoing development, AI chatbots should not yet be relied upon as authoritative sources in TM without expert validation.</p>","PeriodicalId":23631,"journal":{"name":"Vox Sanguinis","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vox Sanguinis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/vox.70009","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objectives: The recent rise of artificial intelligence (AI) chatbots has attracted many users worldwide. However, expert evaluation is essential before relying on them for transfusion medicine (TM)-related information. This study aims to evaluate the performance of AI chatbots for accuracy, correctness, completeness and safety.
Materials and methods: Six AI chatbots (ChatGPT 4, ChatGPT 4-o, Gemini Advanced, Copilot, Anthropic Claude 3.5 Sonnet, Meta AI) were tested using TM-related prompts at two time points, 30 days apart. Their responses were assessed by four TM experts. Evaluators' scores underwent inter-rater reliability testing. Responses from Day 30 were compared with those from Day 1 to evaluate consistency and potential evolution over time.
Results: All six chatbots exhibited some level of inconsistency and varying degrees of evolution in their responses over 30 days. None provided entirely correct, complete or safe answers to all questions. Among the chatbots tested, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet demonstrated the highest accuracy and consistency, while Microsoft Copilot and Google Gemini Advanced showed the greatest evolution in their responses. As a limitation, the 30-day period may be too short for a precise assessment of chatbot evolution.
Conclusion: At the time of the conduct of this study, none of the AI chatbots provided fully reliable, complete or safe responses to all TM-related prompts. However, ChatGPT 4-o and Anthropic Claude 3.5 Sonnet show the highest promise for future integration into TM practices. Given their variability and ongoing development, AI chatbots should not yet be relied upon as authoritative sources in TM without expert validation.
期刊介绍:
Vox Sanguinis reports on important, novel developments in transfusion medicine. Original papers, reviews and international fora are published on all aspects of blood transfusion and tissue transplantation, comprising five main sections:
1) Transfusion - Transmitted Disease and its Prevention:
Identification and epidemiology of infectious agents transmissible by blood;
Bacterial contamination of blood components;
Donor recruitment and selection methods;
Pathogen inactivation.
2) Blood Component Collection and Production:
Blood collection methods and devices (including apheresis);
Plasma fractionation techniques and plasma derivatives;
Preparation of labile blood components;
Inventory management;
Hematopoietic progenitor cell collection and storage;
Collection and storage of tissues;
Quality management and good manufacturing practice;
Automation and information technology.
3) Transfusion Medicine and New Therapies:
Transfusion thresholds and audits;
Haemovigilance;
Clinical trials regarding appropriate haemotherapy;
Non-infectious adverse affects of transfusion;
Therapeutic apheresis;
Support of transplant patients;
Gene therapy and immunotherapy.
4) Immunohaematology and Immunogenetics:
Autoimmunity in haematology;
Alloimmunity of blood;
Pre-transfusion testing;
Immunodiagnostics;
Immunobiology;
Complement in immunohaematology;
Blood typing reagents;
Genetic markers of blood cells and serum proteins: polymorphisms and function;
Genetic markers and disease;
Parentage testing and forensic immunohaematology.
5) Cellular Therapy:
Cell-based therapies;
Stem cell sources;
Stem cell processing and storage;
Stem cell products;
Stem cell plasticity;
Regenerative medicine with cells;
Cellular immunotherapy;
Molecular therapy;
Gene therapy.