Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu
{"title":"Assessing and Enhancing Large Language Models in Rare Disease Question-answering","authors":"Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu","doi":"arxiv-2408.08422","DOIUrl":null,"url":null,"abstract":"Despite the impressive capabilities of Large Language Models (LLMs) in\ngeneral medical domains, questions remain about their performance in diagnosing\nrare diseases. To answer this question, we aim to assess the diagnostic\nperformance of LLMs in rare diseases, and explore methods to enhance their\neffectiveness in this area. In this work, we introduce a rare disease\nquestion-answering (ReDis-QA) dataset to evaluate the performance of LLMs in\ndiagnosing rare diseases. Specifically, we collected 1360 high-quality\nquestion-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.\nAdditionally, we annotated meta-data for each question, facilitating the\nextraction of subsets specific to any given disease and its property. Based on\nthe ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that\ndiagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis,\nwe collect the first rare diseases corpus (ReCOP), sourced from the National\nOrganization for Rare Disorders (NORD) database. Specifically, we split the\nreport of each rare disease into multiple chunks, each representing a different\nproperty of the disease, including their overview, symptoms, causes, effects,\nrelated disorders, diagnosis, and standard therapies. This structure ensures\nthat the information within each chunk aligns consistently with a question.\nExperiment results demonstrate that ReCOP can effectively improve the accuracy\nof LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly\nguides LLMs to generate trustworthy answers and explanations that can be traced\nback to existing literature.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the impressive capabilities of Large Language Models (LLMs) in
general medical domains, questions remain about their performance in diagnosing
rare diseases. To answer this question, we aim to assess the diagnostic
performance of LLMs in rare diseases, and explore methods to enhance their
effectiveness in this area. In this work, we introduce a rare disease
question-answering (ReDis-QA) dataset to evaluate the performance of LLMs in
diagnosing rare diseases. Specifically, we collected 1360 high-quality
question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
Additionally, we annotated meta-data for each question, facilitating the
extraction of subsets specific to any given disease and its property. Based on
the ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that
diagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis,
we collect the first rare diseases corpus (ReCOP), sourced from the National
Organization for Rare Disorders (NORD) database. Specifically, we split the
report of each rare disease into multiple chunks, each representing a different
property of the disease, including their overview, symptoms, causes, effects,
related disorders, diagnosis, and standard therapies. This structure ensures
that the information within each chunk aligns consistently with a question.
Experiment results demonstrate that ReCOP can effectively improve the accuracy
of LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly
guides LLMs to generate trustworthy answers and explanations that can be traced
back to existing literature.