{"title":"人工智能模型在治疗性采血训练中的比较性能分析:一项试点研究","authors":"Mehmet Koca","doi":"10.1016/j.transci.2025.104188","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study aims to evaluate the theoretical knowledge level, consistency, and performance of three different artificial intelligence (AI) models—ChatGPT-4o, o1-preview, and Claude 3.5 Sonnet (New)—based on 75 five-option multiple-choice questions from the question pool of the therapeutic apheresis certification exam organized by the Republic of Türkiye Ministry of Health.</div></div><div><h3>Methods</h3><div>In the study, 75 questions from the apheresis course exam were presented to the AI models in separate conversation sessions, requiring step-by-step reasoning. Each question was asked twice to prevent inconsistencies in the models' responses; if a discrepancy was detected between the first two answers, a third query was conducted. This method resulted in a total of 485 question-answer records. The data were analyzed using correct answer rates, Cohen's kappa coefficient for agreement between runs, correlation analysis, and the chi-square test.</div></div><div><h3>Results</h3><div>The overall accuracy rates were determined as 61 % for ChatGPT-4o, 67 % for o1-preview, and 59 % for Claude 3.5 Sonnet. The consistency between the two runs of the models was found to be good (kappa = 0.700–0.765). In correlation analyses between the AI models' responses and the answer key, the o1-preview model demonstrated the highest agreement (<em>r</em> = 0.494, <em>p</em> < 0.001).</div></div><div><h3>Conclusion</h3><div>The findings suggest that the examined AI models perform at a level close to the certification threshold in the field of therapeutic apheresis. Future studies are recommended to include larger question pools and explore different medical disciplines.</div></div>","PeriodicalId":49422,"journal":{"name":"Transfusion and Apheresis Science","volume":"64 4","pages":"Article 104188"},"PeriodicalIF":1.2000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative performance analysis of artificial intelligence models in therapeutic apheresis training: A pilot study\",\"authors\":\"Mehmet Koca\",\"doi\":\"10.1016/j.transci.2025.104188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objectives</h3><div>This study aims to evaluate the theoretical knowledge level, consistency, and performance of three different artificial intelligence (AI) models—ChatGPT-4o, o1-preview, and Claude 3.5 Sonnet (New)—based on 75 five-option multiple-choice questions from the question pool of the therapeutic apheresis certification exam organized by the Republic of Türkiye Ministry of Health.</div></div><div><h3>Methods</h3><div>In the study, 75 questions from the apheresis course exam were presented to the AI models in separate conversation sessions, requiring step-by-step reasoning. Each question was asked twice to prevent inconsistencies in the models' responses; if a discrepancy was detected between the first two answers, a third query was conducted. This method resulted in a total of 485 question-answer records. The data were analyzed using correct answer rates, Cohen's kappa coefficient for agreement between runs, correlation analysis, and the chi-square test.</div></div><div><h3>Results</h3><div>The overall accuracy rates were determined as 61 % for ChatGPT-4o, 67 % for o1-preview, and 59 % for Claude 3.5 Sonnet. The consistency between the two runs of the models was found to be good (kappa = 0.700–0.765). In correlation analyses between the AI models' responses and the answer key, the o1-preview model demonstrated the highest agreement (<em>r</em> = 0.494, <em>p</em> < 0.001).</div></div><div><h3>Conclusion</h3><div>The findings suggest that the examined AI models perform at a level close to the certification threshold in the field of therapeutic apheresis. Future studies are recommended to include larger question pools and explore different medical disciplines.</div></div>\",\"PeriodicalId\":49422,\"journal\":{\"name\":\"Transfusion and Apheresis Science\",\"volume\":\"64 4\",\"pages\":\"Article 104188\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transfusion and Apheresis Science\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1473050225001259\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transfusion and Apheresis Science","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1473050225001259","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"HEMATOLOGY","Score":null,"Total":0}
Comparative performance analysis of artificial intelligence models in therapeutic apheresis training: A pilot study
Objectives
This study aims to evaluate the theoretical knowledge level, consistency, and performance of three different artificial intelligence (AI) models—ChatGPT-4o, o1-preview, and Claude 3.5 Sonnet (New)—based on 75 five-option multiple-choice questions from the question pool of the therapeutic apheresis certification exam organized by the Republic of Türkiye Ministry of Health.
Methods
In the study, 75 questions from the apheresis course exam were presented to the AI models in separate conversation sessions, requiring step-by-step reasoning. Each question was asked twice to prevent inconsistencies in the models' responses; if a discrepancy was detected between the first two answers, a third query was conducted. This method resulted in a total of 485 question-answer records. The data were analyzed using correct answer rates, Cohen's kappa coefficient for agreement between runs, correlation analysis, and the chi-square test.
Results
The overall accuracy rates were determined as 61 % for ChatGPT-4o, 67 % for o1-preview, and 59 % for Claude 3.5 Sonnet. The consistency between the two runs of the models was found to be good (kappa = 0.700–0.765). In correlation analyses between the AI models' responses and the answer key, the o1-preview model demonstrated the highest agreement (r = 0.494, p < 0.001).
Conclusion
The findings suggest that the examined AI models perform at a level close to the certification threshold in the field of therapeutic apheresis. Future studies are recommended to include larger question pools and explore different medical disciplines.
期刊介绍:
Transfusion and Apheresis Science brings comprehensive and up-to-date information to physicians and health care professionals involved in the rapidly changing fields of transfusion medicine, hemostasis and apheresis. The journal presents original articles relating to scientific and clinical studies in the areas of immunohematology, transfusion practice, bleeding and thrombotic disorders and both therapeutic and donor apheresis including hematopoietic stem cells. Topics covered include the collection and processing of blood, compatibility testing and guidelines for the use of blood products, as well as screening for and transmission of blood-borne diseases. All areas of apheresis - therapeutic and collection - are also addressed. We would like to specifically encourage allied health professionals in this area to submit manuscripts that relate to improved patient and donor care, technical aspects and educational issues.
Transfusion and Apheresis Science features a "Theme" section which includes, in each issue, a group of papers designed to review a specific topic of current importance in transfusion and hemostasis for the discussion of topical issues specific to apheresis and focuses on the operators'' viewpoint. Another section is "What''s Happening" which provides informal reporting of activities in the field. In addition, brief case reports and Letters to the Editor, as well as reviews of meetings and events of general interest, and a listing of recent patents make the journal a complete source of information for practitioners of transfusion, hemostasis and apheresis science. Immediate dissemination of important information is ensured by the commitment of Transfusion and Apheresis Science to rapid publication of both symposia and submitted papers.