Evaluating large language models for WAO/EAACI guideline compliance in hereditary angioedema management.

IF 2.1 4区 医学 Q3 ALLERGY
Allergologia et immunopathologia Pub Date : 2025-07-01 eCollection Date: 2025-01-01 DOI:10.15586/aei.v53i4.1353
Mehmet Emin Gerek, Tuğba Önalan, Fatih Çölkesen, Şevket Arslan
{"title":"Evaluating large language models for WAO/EAACI guideline compliance in hereditary angioedema management.","authors":"Mehmet Emin Gerek, Tuğba Önalan, Fatih Çölkesen, Şevket Arslan","doi":"10.15586/aei.v53i4.1353","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Hereditary angioedema (HAE) is a rare but potentially life-threatening disorder characterized by recurrent swelling episodes. Adherence to clinical guidelines, such as the World Allergy Organization/European Academy of Allergy & Clinical Immunology (WAO/EAACI) guidelines, is crucial for effective management. With the increasing role of artificial intelligence in medicine, large language models (LLMs) offer potential for clinical decision support. This study evaluates the performance of ChatGPT, Gemini, Perplexity, and Copilot in providing guideline-adherent responses for HAE management.</p><p><strong>Methods: </strong>Twenty-eight key recommendations from the WAO/EAACI HAE guidelines were reformulated into interrogative formats and posed to the selected LLMs. Two independent clinicians assessed responses based on accuracy, adequacy, clarity, and citation reliability using a five-point Likert scale. References were categorized as guideline-based, trustworthy, or untrustworthy. A reevaluation with explicit citation instructions was conducted, with discrepancies resolved by a third reviewer.</p><p><strong>Results: </strong>ChatGPT and Gemini outperformed Perplexity and Copilot, achieving median accuracy and adequacy scores of 5.0 versus 3.0, respectively. ChatGPT had the lowest rate of unreliable references, whereas Gemini showed inconsistency in citation behavior. Significant differences in response quality were observed among models (<i>p</i> < 0.001). Providing explicit sourcing instructions improved performance consistency, particularly for Gemini.</p><p><strong>Conclusion: </strong>ChatGPT and Gemini demonstrated superior adherence to WAO/EAACI guidelines, suggesting that LLMs can support clinical decision-making in rare diseases. However, inconsistencies in citation practices highlight the need for further validation and optimization to enhance reliability in medical applications.</p>","PeriodicalId":7536,"journal":{"name":"Allergologia et immunopathologia","volume":"53 4","pages":"51-59"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Allergologia et immunopathologia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.15586/aei.v53i4.1353","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"ALLERGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Hereditary angioedema (HAE) is a rare but potentially life-threatening disorder characterized by recurrent swelling episodes. Adherence to clinical guidelines, such as the World Allergy Organization/European Academy of Allergy & Clinical Immunology (WAO/EAACI) guidelines, is crucial for effective management. With the increasing role of artificial intelligence in medicine, large language models (LLMs) offer potential for clinical decision support. This study evaluates the performance of ChatGPT, Gemini, Perplexity, and Copilot in providing guideline-adherent responses for HAE management.

Methods: Twenty-eight key recommendations from the WAO/EAACI HAE guidelines were reformulated into interrogative formats and posed to the selected LLMs. Two independent clinicians assessed responses based on accuracy, adequacy, clarity, and citation reliability using a five-point Likert scale. References were categorized as guideline-based, trustworthy, or untrustworthy. A reevaluation with explicit citation instructions was conducted, with discrepancies resolved by a third reviewer.

Results: ChatGPT and Gemini outperformed Perplexity and Copilot, achieving median accuracy and adequacy scores of 5.0 versus 3.0, respectively. ChatGPT had the lowest rate of unreliable references, whereas Gemini showed inconsistency in citation behavior. Significant differences in response quality were observed among models (p < 0.001). Providing explicit sourcing instructions improved performance consistency, particularly for Gemini.

Conclusion: ChatGPT and Gemini demonstrated superior adherence to WAO/EAACI guidelines, suggesting that LLMs can support clinical decision-making in rare diseases. However, inconsistencies in citation practices highlight the need for further validation and optimization to enhance reliability in medical applications.

评估遗传性血管性水肿治疗中WAO/EAACI指南依从性的大型语言模型。
遗传性血管性水肿(HAE)是一种罕见但可能危及生命的疾病,其特征是反复出现肿胀发作。遵守临床指南,如世界过敏组织/欧洲过敏与临床免疫学学会(WAO/EAACI)指南,对有效管理至关重要。随着人工智能在医学中的作用越来越大,大型语言模型(LLMs)为临床决策支持提供了潜力。本研究评估ChatGPT、Gemini、Perplexity和Copilot在为HAE治疗提供指南依从性反应方面的表现。方法:将WAO/EAACI HAE指南中的28项关键建议重新制定为询问格式,并向选定的法学硕士提出。两名独立的临床医生使用李克特五分制评估反应的准确性、充分性、清晰度和引用可靠性。参考文献被分类为基于指南的、可信的或不可信的。在明确的引用说明下进行了重新评估,差异由第三审稿人解决。结果:ChatGPT和Gemini优于Perplexity和Copilot,分别达到5.0和3.0的中位准确性和充分性得分。ChatGPT的不可靠引用率最低,而Gemini在引用行为上表现出不一致。不同模型的反应质量差异显著(p < 0.001)。提供明确的采购指令提高了性能一致性,特别是对于Gemini。结论:ChatGPT和Gemini对WAO/EAACI指南的依从性较好,表明LLMs可以支持罕见病的临床决策。然而,引文实践中的不一致性突出了进一步验证和优化以提高医学应用可靠性的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.70
自引率
0.00%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Founded in 1972 by Professor A. Oehling, Allergologia et Immunopathologia is a forum for those working in the field of pediatric asthma, allergy and immunology. Manuscripts related to clinical, epidemiological and experimental allergy and immunopathology related to childhood will be considered for publication. Allergologia et Immunopathologia is the official journal of the Spanish Society of Pediatric Allergy and Clinical Immunology (SEICAP) and also of the Latin American Society of Immunodeficiencies (LASID). It has and independent international Editorial Committee which submits received papers for peer-reviewing by international experts. The journal accepts original and review articles from all over the world, together with consensus statements from the aforementioned societies. Occasionally, the opinion of an expert on a burning topic is published in the "Point of View" section. Letters to the Editor on previously published papers are welcomed. Allergologia et Immunopathologia publishes 6 issues per year and is included in the major databases such as Pubmed, Scopus, Web of Knowledge, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信