Guangyu Ao, Min Chen, Jing Li, Huibing Nie, Lei Zhang, Zejun Chen
{"title":"Comparative analysis of large language models on rare disease identification.","authors":"Guangyu Ao, Min Chen, Jing Li, Huibing Nie, Lei Zhang, Zejun Chen","doi":"10.1186/s13023-025-03656-w","DOIUrl":null,"url":null,"abstract":"<p><p>Diagnosing rare diseases is challenging due to their low prevalence, diverse presentations, and limited recognition, often leading to diagnostic delays and errors. This study evaluates the effectiveness of multiple large language models (LLMs) in identifying rare diseases, comparing their performance with that of human physicians using real clinical cases. We analyzed 152 rare disease cases from the Chinese Medical Case Repository using four LLMs: ChatGPT-4o, Claude 3.5 Sonnet, Gemini Advanced, and Llama 3.1 405B. Overall, the LLMs performed better than human physicians, and Claude 3.5 Sonnet achieved the highest accuracy at 78.9%, significantly surpassing the accuracy of human physicians, which was 26.3%. These findings suggest that LLMs can improve rare disease diagnosis and serve as valuable tools in clinical settings, particularly in regions with limited resources. However, further validation and careful consideration of ethical and privacy issues are necessary for their effective integration into medical practice.</p>","PeriodicalId":19651,"journal":{"name":"Orphanet Journal of Rare Diseases","volume":"20 1","pages":"150"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11959745/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orphanet Journal of Rare Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13023-025-03656-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Diagnosing rare diseases is challenging due to their low prevalence, diverse presentations, and limited recognition, often leading to diagnostic delays and errors. This study evaluates the effectiveness of multiple large language models (LLMs) in identifying rare diseases, comparing their performance with that of human physicians using real clinical cases. We analyzed 152 rare disease cases from the Chinese Medical Case Repository using four LLMs: ChatGPT-4o, Claude 3.5 Sonnet, Gemini Advanced, and Llama 3.1 405B. Overall, the LLMs performed better than human physicians, and Claude 3.5 Sonnet achieved the highest accuracy at 78.9%, significantly surpassing the accuracy of human physicians, which was 26.3%. These findings suggest that LLMs can improve rare disease diagnosis and serve as valuable tools in clinical settings, particularly in regions with limited resources. However, further validation and careful consideration of ethical and privacy issues are necessary for their effective integration into medical practice.
期刊介绍:
Orphanet Journal of Rare Diseases is an open access, peer-reviewed journal that encompasses all aspects of rare diseases and orphan drugs. The journal publishes high-quality reviews on specific rare diseases. In addition, the journal may consider articles on clinical trial outcome reports, either positive or negative, and articles on public health issues in the field of rare diseases and orphan drugs. The journal does not accept case reports.