人工智能模型在治疗性采血训练中的比较性能分析：一项试点研究

IF 1.2 4区医学 Q4 HEMATOLOGY

Transfusion and Apheresis Science Pub Date : 2025-06-20 DOI:10.1016/j.transci.2025.104188

Mehmet Koca

{"title":"人工智能模型在治疗性采血训练中的比较性能分析：一项试点研究","authors":"Mehmet Koca","doi":"10.1016/j.transci.2025.104188","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study aims to evaluate the theoretical knowledge level, consistency, and performance of three different artificial intelligence (AI) models—ChatGPT-4o, o1-preview, and Claude 3.5 Sonnet (New)—based on 75 five-option multiple-choice questions from the question pool of the therapeutic apheresis certification exam organized by the Republic of Türkiye Ministry of Health.</div></div><div><h3>Methods</h3><div>In the study, 75 questions from the apheresis course exam were presented to the AI models in separate conversation sessions, requiring step-by-step reasoning. Each question was asked twice to prevent inconsistencies in the models' responses; if a discrepancy was detected between the first two answers, a third query was conducted. This method resulted in a total of 485 question-answer records. The data were analyzed using correct answer rates, Cohen's kappa coefficient for agreement between runs, correlation analysis, and the chi-square test.</div></div><div><h3>Results</h3><div>The overall accuracy rates were determined as 61 % for ChatGPT-4o, 67 % for o1-preview, and 59 % for Claude 3.5 Sonnet. The consistency between the two runs of the models was found to be good (kappa = 0.700–0.765). In correlation analyses between the AI models' responses and the answer key, the o1-preview model demonstrated the highest agreement (<em>r</em> = 0.494, <em>p</em> < 0.001).</div></div><div><h3>Conclusion</h3><div>The findings suggest that the examined AI models perform at a level close to the certification threshold in the field of therapeutic apheresis. Future studies are recommended to include larger question pools and explore different medical disciplines.</div></div>","PeriodicalId":49422,"journal":{"name":"Transfusion and Apheresis Science","volume":"64 4","pages":"Article 104188"},"PeriodicalIF":1.2000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative performance analysis of artificial intelligence models in therapeutic apheresis training: A pilot study\",\"authors\":\"Mehmet Koca\",\"doi\":\"10.1016/j.transci.2025.104188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objectives</h3><div>This study aims to evaluate the theoretical knowledge level, consistency, and performance of three different artificial intelligence (AI) models—ChatGPT-4o, o1-preview, and Claude 3.5 Sonnet (New)—based on 75 five-option multiple-choice questions from the question pool of the therapeutic apheresis certification exam organized by the Republic of Türkiye Ministry of Health.</div></div><div><h3>Methods</h3><div>In the study, 75 questions from the apheresis course exam were presented to the AI models in separate conversation sessions, requiring step-by-step reasoning. Each question was asked twice to prevent inconsistencies in the models' responses; if a discrepancy was detected between the first two answers, a third query was conducted. This method resulted in a total of 485 question-answer records. The data were analyzed using correct answer rates, Cohen's kappa coefficient for agreement between runs, correlation analysis, and the chi-square test.</div></div><div><h3>Results</h3><div>The overall accuracy rates were determined as 61 % for ChatGPT-4o, 67 % for o1-preview, and 59 % for Claude 3.5 Sonnet. The consistency between the two runs of the models was found to be good (kappa = 0.700–0.765). In correlation analyses between the AI models' responses and the answer key, the o1-preview model demonstrated the highest agreement (<em>r</em> = 0.494, <em>p</em> < 0.001).</div></div><div><h3>Conclusion</h3><div>The findings suggest that the examined AI models perform at a level close to the certification threshold in the field of therapeutic apheresis. Future studies are recommended to include larger question pools and explore different medical disciplines.</div></div>\",\"PeriodicalId\":49422,\"journal\":{\"name\":\"Transfusion and Apheresis Science\",\"volume\":\"64 4\",\"pages\":\"Article 104188\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transfusion and Apheresis Science\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1473050225001259\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transfusion and Apheresis Science","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1473050225001259","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"HEMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在评估chatgpt - 40、01 -preview和Claude 3.5 Sonnet （New）三种不同的人工智能（AI）模型的理论知识水平、一致性和性能，基于捷克共和国卫生部组织的治疗性采血认证考试题库中的75道五选项选择题。方法在本研究中，将辩论赛课程考试中的75个问题以单独的会话形式呈现给人工智能模型，要求其逐步推理。每个问题都被问了两次，以防止模型的回答不一致；如果检测到前两个答案之间存在差异，则执行第三个查询。该方法共产生485条问答记录。使用正确答题率、科恩卡帕系数（Cohen’s kappa coefficient）、相关分析和卡方检验来分析数据。结果chatgpt - 40的总体准确率为61 %，01 -preview的准确率为67 %，Claude 3.5 Sonnet的准确率为59 %。两组模型的一致性较好（kappa = 0.700 ~ 0.765）。在人工智能模型的回答与答案关键字的相关性分析中，1-预览模型的一致性最高（r = 0.494,p <； 0.001）。结论研究结果表明，所检测的人工智能模型在治疗性血液分离领域的表现接近认证门槛。未来的研究建议包括更大的问题池和探索不同的医学学科。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative performance analysis of artificial intelligence models in therapeutic apheresis training: A pilot study

Objectives

This study aims to evaluate the theoretical knowledge level, consistency, and performance of three different artificial intelligence (AI) models—ChatGPT-4o, o1-preview, and Claude 3.5 Sonnet (New)—based on 75 five-option multiple-choice questions from the question pool of the therapeutic apheresis certification exam organized by the Republic of Türkiye Ministry of Health.

Methods

In the study, 75 questions from the apheresis course exam were presented to the AI models in separate conversation sessions, requiring step-by-step reasoning. Each question was asked twice to prevent inconsistencies in the models' responses; if a discrepancy was detected between the first two answers, a third query was conducted. This method resulted in a total of 485 question-answer records. The data were analyzed using correct answer rates, Cohen's kappa coefficient for agreement between runs, correlation analysis, and the chi-square test.

Results

The overall accuracy rates were determined as 61 % for ChatGPT-4o, 67 % for o1-preview, and 59 % for Claude 3.5 Sonnet. The consistency between the two runs of the models was found to be good (kappa = 0.700–0.765). In correlation analyses between the AI models' responses and the answer key, the o1-preview model demonstrated the highest agreement (r = 0.494, p < 0.001).

Conclusion

The findings suggest that the examined AI models perform at a level close to the certification threshold in the field of therapeutic apheresis. Future studies are recommended to include larger question pools and explore different medical disciplines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transfusion and Apheresis Science 医学-血液学

CiteScore

3.60

自引率

5.30%

发文量

181

审稿时长

42 days

期刊介绍： Transfusion and Apheresis Science brings comprehensive and up-to-date information to physicians and health care professionals involved in the rapidly changing fields of transfusion medicine, hemostasis and apheresis. The journal presents original articles relating to scientific and clinical studies in the areas of immunohematology, transfusion practice, bleeding and thrombotic disorders and both therapeutic and donor apheresis including hematopoietic stem cells. Topics covered include the collection and processing of blood, compatibility testing and guidelines for the use of blood products, as well as screening for and transmission of blood-borne diseases. All areas of apheresis - therapeutic and collection - are also addressed. We would like to specifically encourage allied health professionals in this area to submit manuscripts that relate to improved patient and donor care, technical aspects and educational issues. Transfusion and Apheresis Science features a "Theme" section which includes, in each issue, a group of papers designed to review a specific topic of current importance in transfusion and hemostasis for the discussion of topical issues specific to apheresis and focuses on the operators'' viewpoint. Another section is "What''s Happening" which provides informal reporting of activities in the field. In addition, brief case reports and Letters to the Editor, as well as reviews of meetings and events of general interest, and a listing of recent patents make the journal a complete source of information for practitioners of transfusion, hemostasis and apheresis science. Immediate dissemination of important information is ensured by the commitment of Transfusion and Apheresis Science to rapid publication of both symposia and submitted papers.