Nikola Vladic, Stephan Nopp, Ingrid Pabinger, Walter Ageno, Jean M Connors, Sabine Eichinger, Cihan Ay
{"title":"大型语言模型与血栓专家:静脉血栓栓塞患者教育和临床决策的比较研究。","authors":"Nikola Vladic, Stephan Nopp, Ingrid Pabinger, Walter Ageno, Jean M Connors, Sabine Eichinger, Cihan Ay","doi":"10.1016/j.jtha.2025.09.004","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have demonstrated remarkable capabilities in various medical fields, yet their performance in thrombosis and haemostasis, particularly in patient education and complex clinical decision-making, is unexplored.</p><p><strong>Objectives: </strong>We aimed to compare the quality of responses from LLMs versus thrombosis experts and assess clinician's ability to distinguish between them.</p><p><strong>Methods: </strong>Three experts on thrombosis and haemostasis and three LLMs (Le Chat Pixtral Large, DeepSeek-R1, ChatGPT-4.5) answered three patient education and three clinical decision-making queries. Thirty-seven physicians rated responses for adequacy (1 = \"very poor\", 10 = \"excellent\") and estimated their origin (1 = \"certainly LLM,\" 10 = \"certainly human\"). Mean differences were assessed via t-tests, medians via Wilcoxon tests, and correlations via Spearman's test. All p values were Bonferroni-adjusted.</p><p><strong>Results: </strong>LLMs provided significantly better patient education responses than experts. Mean adequacy score differences were: Le Chat Pixtral Large +1.6 (95% CI: 1.3-2.0, p<0.01), DeepSeek-R1 +1.7 (95% CI: 1.3-2.1, p<0.001), and ChatGPT-4.5 +1.9 (95% CI: 1.6-2.3, p<0.001). In clinical decision-making, DeepSeek-R1 outperformed experts, (+1.4; 95% CI: 1.1-1.8, p<0.001), whereas Le Chat Pixtral Large (-0.3; 95% CI, -0.8-0.1, p=0.96) and ChatGPT-4.5 (+0.5; 95% CI: 0.0-0.9, p=0.18), performed comparably to experts. Evaluators couldn't distinguish between expert (median: 6.0, interquartile range [IQR]: 3.0-8.0) and LLM-generated responses (median 6.0, IQR: 4.0-8.0).</p><p><strong>Conclusion: </strong>LLMs outperform experts in VTE-related patient education and match or exceed them in clinical decision-making, providing responses indistinguishable from experts. Though major barriers need to be addressed, LLMs have strong potential to support clinical management and patient education in VTE.</p>","PeriodicalId":17326,"journal":{"name":"Journal of Thrombosis and Haemostasis","volume":" ","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language models versus thrombosis experts: A comparative study on patient education and clinical decision-making in venous thromboembolism.\",\"authors\":\"Nikola Vladic, Stephan Nopp, Ingrid Pabinger, Walter Ageno, Jean M Connors, Sabine Eichinger, Cihan Ay\",\"doi\":\"10.1016/j.jtha.2025.09.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models (LLMs) have demonstrated remarkable capabilities in various medical fields, yet their performance in thrombosis and haemostasis, particularly in patient education and complex clinical decision-making, is unexplored.</p><p><strong>Objectives: </strong>We aimed to compare the quality of responses from LLMs versus thrombosis experts and assess clinician's ability to distinguish between them.</p><p><strong>Methods: </strong>Three experts on thrombosis and haemostasis and three LLMs (Le Chat Pixtral Large, DeepSeek-R1, ChatGPT-4.5) answered three patient education and three clinical decision-making queries. Thirty-seven physicians rated responses for adequacy (1 = \\\"very poor\\\", 10 = \\\"excellent\\\") and estimated their origin (1 = \\\"certainly LLM,\\\" 10 = \\\"certainly human\\\"). Mean differences were assessed via t-tests, medians via Wilcoxon tests, and correlations via Spearman's test. All p values were Bonferroni-adjusted.</p><p><strong>Results: </strong>LLMs provided significantly better patient education responses than experts. Mean adequacy score differences were: Le Chat Pixtral Large +1.6 (95% CI: 1.3-2.0, p<0.01), DeepSeek-R1 +1.7 (95% CI: 1.3-2.1, p<0.001), and ChatGPT-4.5 +1.9 (95% CI: 1.6-2.3, p<0.001). In clinical decision-making, DeepSeek-R1 outperformed experts, (+1.4; 95% CI: 1.1-1.8, p<0.001), whereas Le Chat Pixtral Large (-0.3; 95% CI, -0.8-0.1, p=0.96) and ChatGPT-4.5 (+0.5; 95% CI: 0.0-0.9, p=0.18), performed comparably to experts. Evaluators couldn't distinguish between expert (median: 6.0, interquartile range [IQR]: 3.0-8.0) and LLM-generated responses (median 6.0, IQR: 4.0-8.0).</p><p><strong>Conclusion: </strong>LLMs outperform experts in VTE-related patient education and match or exceed them in clinical decision-making, providing responses indistinguishable from experts. Though major barriers need to be addressed, LLMs have strong potential to support clinical management and patient education in VTE.</p>\",\"PeriodicalId\":17326,\"journal\":{\"name\":\"Journal of Thrombosis and Haemostasis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Thrombosis and Haemostasis\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jtha.2025.09.004\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Thrombosis and Haemostasis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jtha.2025.09.004","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEMATOLOGY","Score":null,"Total":0}
Large language models versus thrombosis experts: A comparative study on patient education and clinical decision-making in venous thromboembolism.
Background: Large language models (LLMs) have demonstrated remarkable capabilities in various medical fields, yet their performance in thrombosis and haemostasis, particularly in patient education and complex clinical decision-making, is unexplored.
Objectives: We aimed to compare the quality of responses from LLMs versus thrombosis experts and assess clinician's ability to distinguish between them.
Methods: Three experts on thrombosis and haemostasis and three LLMs (Le Chat Pixtral Large, DeepSeek-R1, ChatGPT-4.5) answered three patient education and three clinical decision-making queries. Thirty-seven physicians rated responses for adequacy (1 = "very poor", 10 = "excellent") and estimated their origin (1 = "certainly LLM," 10 = "certainly human"). Mean differences were assessed via t-tests, medians via Wilcoxon tests, and correlations via Spearman's test. All p values were Bonferroni-adjusted.
Results: LLMs provided significantly better patient education responses than experts. Mean adequacy score differences were: Le Chat Pixtral Large +1.6 (95% CI: 1.3-2.0, p<0.01), DeepSeek-R1 +1.7 (95% CI: 1.3-2.1, p<0.001), and ChatGPT-4.5 +1.9 (95% CI: 1.6-2.3, p<0.001). In clinical decision-making, DeepSeek-R1 outperformed experts, (+1.4; 95% CI: 1.1-1.8, p<0.001), whereas Le Chat Pixtral Large (-0.3; 95% CI, -0.8-0.1, p=0.96) and ChatGPT-4.5 (+0.5; 95% CI: 0.0-0.9, p=0.18), performed comparably to experts. Evaluators couldn't distinguish between expert (median: 6.0, interquartile range [IQR]: 3.0-8.0) and LLM-generated responses (median 6.0, IQR: 4.0-8.0).
Conclusion: LLMs outperform experts in VTE-related patient education and match or exceed them in clinical decision-making, providing responses indistinguishable from experts. Though major barriers need to be addressed, LLMs have strong potential to support clinical management and patient education in VTE.
期刊介绍:
The Journal of Thrombosis and Haemostasis (JTH) serves as the official journal of the International Society on Thrombosis and Haemostasis. It is dedicated to advancing science related to thrombosis, bleeding disorders, and vascular biology through the dissemination and exchange of information and ideas within the global research community.
Types of Publications:
The journal publishes a variety of content, including:
Original research reports
State-of-the-art reviews
Brief reports
Case reports
Invited commentaries on publications in the Journal
Forum articles
Correspondence
Announcements
Scope of Contributions:
Editors invite contributions from both fundamental and clinical domains. These include:
Basic manuscripts on blood coagulation and fibrinolysis
Studies on proteins and reactions related to thrombosis and haemostasis
Research on blood platelets and their interactions with other biological systems, such as the vessel wall, blood cells, and invading organisms
Clinical manuscripts covering various topics including venous thrombosis, arterial disease, hemophilia, bleeding disorders, and platelet diseases
Clinical manuscripts may encompass etiology, diagnostics, prognosis, prevention, and treatment strategies.