评估和增强罕见病问答中的大型语言模型

arXiv - CS - Computational Engineering, Finance, and Science Pub Date : 2024-08-15 DOI:arxiv-2408.08422

Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu

{"title":"评估和增强罕见病问答中的大型语言模型","authors":"Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu","doi":"arxiv-2408.08422","DOIUrl":null,"url":null,"abstract":"Despite the impressive capabilities of Large Language Models (LLMs) in\ngeneral medical domains, questions remain about their performance in diagnosing\nrare diseases. To answer this question, we aim to assess the diagnostic\nperformance of LLMs in rare diseases, and explore methods to enhance their\neffectiveness in this area. In this work, we introduce a rare disease\nquestion-answering (ReDis-QA) dataset to evaluate the performance of LLMs in\ndiagnosing rare diseases. Specifically, we collected 1360 high-quality\nquestion-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.\nAdditionally, we annotated meta-data for each question, facilitating the\nextraction of subsets specific to any given disease and its property. Based on\nthe ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that\ndiagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis,\nwe collect the first rare diseases corpus (ReCOP), sourced from the National\nOrganization for Rare Disorders (NORD) database. Specifically, we split the\nreport of each rare disease into multiple chunks, each representing a different\nproperty of the disease, including their overview, symptoms, causes, effects,\nrelated disorders, diagnosis, and standard therapies. This structure ensures\nthat the information within each chunk aligns consistently with a question.\nExperiment results demonstrate that ReCOP can effectively improve the accuracy\nof LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly\nguides LLMs to generate trustworthy answers and explanations that can be traced\nback to existing literature.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing and Enhancing Large Language Models in Rare Disease Question-answering\",\"authors\":\"Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu\",\"doi\":\"arxiv-2408.08422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the impressive capabilities of Large Language Models (LLMs) in\\ngeneral medical domains, questions remain about their performance in diagnosing\\nrare diseases. To answer this question, we aim to assess the diagnostic\\nperformance of LLMs in rare diseases, and explore methods to enhance their\\neffectiveness in this area. In this work, we introduce a rare disease\\nquestion-answering (ReDis-QA) dataset to evaluate the performance of LLMs in\\ndiagnosing rare diseases. Specifically, we collected 1360 high-quality\\nquestion-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.\\nAdditionally, we annotated meta-data for each question, facilitating the\\nextraction of subsets specific to any given disease and its property. Based on\\nthe ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that\\ndiagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis,\\nwe collect the first rare diseases corpus (ReCOP), sourced from the National\\nOrganization for Rare Disorders (NORD) database. Specifically, we split the\\nreport of each rare disease into multiple chunks, each representing a different\\nproperty of the disease, including their overview, symptoms, causes, effects,\\nrelated disorders, diagnosis, and standard therapies. This structure ensures\\nthat the information within each chunk aligns consistently with a question.\\nExperiment results demonstrate that ReCOP can effectively improve the accuracy\\nof LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly\\nguides LLMs to generate trustworthy answers and explanations that can be traced\\nback to existing literature.\",\"PeriodicalId\":501309,\"journal\":{\"name\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.08422\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

尽管大语言模型（LLMs）在一般医疗领域的能力令人印象深刻，但它们在罕见疾病诊断中的表现仍然存在问题。为了回答这个问题，我们旨在评估大型语言模型在罕见疾病中的诊断性能，并探索提高其在这一领域有效性的方法。在这项工作中，我们引入了一个罕见疾病问题解答（ReDis-QA）数据集，以评估 LLMs 诊断罕见疾病的性能。具体来说，我们在 ReDis-QA 数据集中收集了 1360 个高质量的问答对，涵盖 205 种罕见疾病。此外，我们还为每个问题标注了元数据，以便于提取特定疾病及其属性的子集。在 ReDis-QA 数据集的基础上，我们对几种开源 LLM 进行了基准测试，结果表明诊断罕见疾病仍然是这些模型面临的重大挑战。为了促进罕见病诊断的检索增强生成，我们收集了第一个罕见病语料库（ReCOP），该语料库来自美国国家罕见病组织（NORD）数据库。具体来说，我们将每种罕见病的报告分成多个区块，每个区块代表该疾病的不同属性，包括概述、症状、病因、影响、相关疾病、诊断和标准疗法。实验结果表明，ReCOP 可以有效提高 ReDis-QA 数据集上 LLM 的准确率，平均提高 8%。实验结果表明，ReCOP 可以有效地提高 LLM 在 ReDis-QA 数据集上的准确率，平均提高 8%。此外，它还能极大地指导 LLM 生成可信的答案和解释，这些答案和解释可以追溯到现有的文献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing and Enhancing Large Language Models in Rare Disease Question-answering

Despite the impressive capabilities of Large Language Models (LLMs) in general medical domains, questions remain about their performance in diagnosing rare diseases. To answer this question, we aim to assess the diagnostic performance of LLMs in rare diseases, and explore methods to enhance their effectiveness in this area. In this work, we introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of LLMs in diagnosing rare diseases. Specifically, we collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases. Additionally, we annotated meta-data for each question, facilitating the extraction of subsets specific to any given disease and its property. Based on the ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis, we collect the first rare diseases corpus (ReCOP), sourced from the National Organization for Rare Disorders (NORD) database. Specifically, we split the report of each rare disease into multiple chunks, each representing a different property of the disease, including their overview, symptoms, causes, effects, related disorders, diagnosis, and standard therapies. This structure ensures that the information within each chunk aligns consistently with a question. Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly guides LLMs to generate trustworthy answers and explanations that can be traced back to existing literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computational Engineering, Finance, and Science

自引率

0.00%

发文量