Jong Kwon Lee,Sooin Choi,Sholhui Park,Sang-Hyun Hwang,Duck Cho
{"title":"6大语言模型临床决策支持评价:在RhD血型患者输血决策中的应用。","authors":"Jong Kwon Lee,Sooin Choi,Sholhui Park,Sang-Hyun Hwang,Duck Cho","doi":"10.3343/alm.2024.0588","DOIUrl":null,"url":null,"abstract":"Background\r\nLarge language models (LLMs) have the potential for clinical decision support; however, their use in specific tasks, such as determining the RhD blood type for transfusion, remains underexplored. Therefore, we evaluated the accuracy of six LLMs in addressing RhD blood type-related issues in Korean healthcare.\r\n\r\nMethods\r\nFifteen multiple-choice and true/false questions, based on real-world transfusion scenarios and reviewed by specialists, were developed. The questions were administered twice to six LLMs (Clova X, Gemini 1.0, Gemini 1.5, ChatGPT-3.5, GPT-4.0, and GPT-4o) in both Korean and English. Results were compared against the performance of 22 transfusion medicine experts. For particularly challenging questions, prompt engineering was applied, and the questions were reevaluated.\r\n\r\nResults\r\nGPT-4o demonstrated the highest accuracy rate in Korean (0.6), with significant differences compared with those of Clova X and Gemini (P <0.05). In English, the results were similar across all models. The transfusion experts achieved a higher accuracy rate (0.8). Among the five questions subjected to prompt engineering, only GPT-4o correctly responded to one, whereas the other models failed. All LLM models changed their responses or did not respond when the same question was repeated.\r\n\r\nConclusions\r\nGPT-4o showed the best overall performance among the models tested and may be beneficial in RhD blood product transfusion decision-making. However, its performance suggests that it may serve best in a supportive role rather than as a primary decision-making tool.","PeriodicalId":8421,"journal":{"name":"Annals of Laboratory Medicine","volume":"10 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Six Large Language Models for Clinical Decision Support: Application in Transfusion Decision-making for RhD Blood-type Patients.\",\"authors\":\"Jong Kwon Lee,Sooin Choi,Sholhui Park,Sang-Hyun Hwang,Duck Cho\",\"doi\":\"10.3343/alm.2024.0588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background\\r\\nLarge language models (LLMs) have the potential for clinical decision support; however, their use in specific tasks, such as determining the RhD blood type for transfusion, remains underexplored. Therefore, we evaluated the accuracy of six LLMs in addressing RhD blood type-related issues in Korean healthcare.\\r\\n\\r\\nMethods\\r\\nFifteen multiple-choice and true/false questions, based on real-world transfusion scenarios and reviewed by specialists, were developed. The questions were administered twice to six LLMs (Clova X, Gemini 1.0, Gemini 1.5, ChatGPT-3.5, GPT-4.0, and GPT-4o) in both Korean and English. Results were compared against the performance of 22 transfusion medicine experts. For particularly challenging questions, prompt engineering was applied, and the questions were reevaluated.\\r\\n\\r\\nResults\\r\\nGPT-4o demonstrated the highest accuracy rate in Korean (0.6), with significant differences compared with those of Clova X and Gemini (P <0.05). In English, the results were similar across all models. The transfusion experts achieved a higher accuracy rate (0.8). Among the five questions subjected to prompt engineering, only GPT-4o correctly responded to one, whereas the other models failed. All LLM models changed their responses or did not respond when the same question was repeated.\\r\\n\\r\\nConclusions\\r\\nGPT-4o showed the best overall performance among the models tested and may be beneficial in RhD blood product transfusion decision-making. However, its performance suggests that it may serve best in a supportive role rather than as a primary decision-making tool.\",\"PeriodicalId\":8421,\"journal\":{\"name\":\"Annals of Laboratory Medicine\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Laboratory Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3343/alm.2024.0588\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL LABORATORY TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Laboratory Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3343/alm.2024.0588","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
Evaluation of Six Large Language Models for Clinical Decision Support: Application in Transfusion Decision-making for RhD Blood-type Patients.
Background
Large language models (LLMs) have the potential for clinical decision support; however, their use in specific tasks, such as determining the RhD blood type for transfusion, remains underexplored. Therefore, we evaluated the accuracy of six LLMs in addressing RhD blood type-related issues in Korean healthcare.
Methods
Fifteen multiple-choice and true/false questions, based on real-world transfusion scenarios and reviewed by specialists, were developed. The questions were administered twice to six LLMs (Clova X, Gemini 1.0, Gemini 1.5, ChatGPT-3.5, GPT-4.0, and GPT-4o) in both Korean and English. Results were compared against the performance of 22 transfusion medicine experts. For particularly challenging questions, prompt engineering was applied, and the questions were reevaluated.
Results
GPT-4o demonstrated the highest accuracy rate in Korean (0.6), with significant differences compared with those of Clova X and Gemini (P <0.05). In English, the results were similar across all models. The transfusion experts achieved a higher accuracy rate (0.8). Among the five questions subjected to prompt engineering, only GPT-4o correctly responded to one, whereas the other models failed. All LLM models changed their responses or did not respond when the same question was repeated.
Conclusions
GPT-4o showed the best overall performance among the models tested and may be beneficial in RhD blood product transfusion decision-making. However, its performance suggests that it may serve best in a supportive role rather than as a primary decision-making tool.
期刊介绍:
Annals of Laboratory Medicine is the official journal of Korean Society for Laboratory Medicine. The journal title has been recently changed from the Korean Journal of Laboratory Medicine (ISSN, 1598-6535) from the January issue of 2012. The JCR 2017 Impact factor of Ann Lab Med was 1.916.