Improving automated deep phenotyping through large language models using retrieval-augmented generation.

IF 10.4 1区生物学 Q1 GENETICS & HEREDITY

Genome Medicine Pub Date : 2025-08-18 DOI:10.1186/s13073-025-01521-w

Brandon T Garcia, Lauren Westerfield, Priya Yelemali, Nikhita Gogate, E Andres Rivera-Munoz, Haowei Du, Moez Dawood, Angad Jolly, James R Lupski, Jennifer E Posey

{"title":"Improving automated deep phenotyping through large language models using retrieval-augmented generation.","authors":"Brandon T Garcia, Lauren Westerfield, Priya Yelemali, Nikhita Gogate, E Andres Rivera-Munoz, Haowei Du, Moez Dawood, Angad Jolly, James R Lupski, Jennifer E Posey","doi":"10.1186/s13073-025-01521-w","DOIUrl":null,"url":null,"abstract":"Background: Diagnosing rare genetic disorders relies on precise phenotypic and genotypic analysis, with the Human Phenotype Ontology (HPO) providing a standardized language for capturing clinical phenotypes. Rule-based HPO extraction tools use concept recognition to automatically identify phenotypes, but they often struggle with incomplete phenotype assignment, requiring significant manual review. While large language models (LLMs) hold promise for more context-driven phenotype extraction, they are prone to errors and \"hallucinations,\" making them less reliable without further refinement. We present RAG-HPO, a Python-based tool that leverages retrieval-augmented generation (RAG) to elevate accuracy of HPO term assignment by LLM. This approach bypasses the limitations of baseline models and eliminates the need for time- and resource-intensive fine-tuning. RAG-HPO integrates a dynamic vector database, containing > 54,000 phenotypic phrases mapped to HPO IDs, which allows real-time retrieval and contextual matching. The RAG-HPO workflow begins by extracting phenotypic phrases from clinical text via an LLM and then matching them via semantic similarity to entries within the database. The best term matches are returned to the LLM as context for final HPO term assignment of each phrase.Results: Performance was benchmarked on 112 published case reports with 1792 manually assigned HPO terms and compared to Doc2HPO, ClinPhen, and FastHPOCR. In evaluations, RAG-HPO + LLaMa-3.1 70B achieved a mean precision of 0.81, recall of 0.76, and an F1 score of 0.78-significantly surpassing conventional tools (p < 0.00001). RAG-HPO returned 1648 terms, of which 19.1% (315) were false positives that did not exactly match our manually annotated standard. Among these, < 1% (1/315) represented hallucinations, and 1.3% (4/315) represented terms with no ontological relationship to the desired target; the remaining false positives (95.2%, 300/315) were broader ancestor terms of the target term, which may still be relevant to users in many contexts.Conclusions: RAG-HPO is a user-friendly, adaptable tool designed for secure evaluation of clinical text and outperforms standard HPO-matching tools in precision, recall, and F1. Its enhanced precision and recall represent a substantial advancement in phenotypic analysis, accelerating the identification of genetic mechanisms underlying rare diseases and driving progress in genetic research and clinical genomics. RAG-HPO is available at https://github.com/PoseyPod/RAG-HPO .","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"17 1","pages":"91"},"PeriodicalIF":10.4000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359922/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-025-01521-w","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Diagnosing rare genetic disorders relies on precise phenotypic and genotypic analysis, with the Human Phenotype Ontology (HPO) providing a standardized language for capturing clinical phenotypes. Rule-based HPO extraction tools use concept recognition to automatically identify phenotypes, but they often struggle with incomplete phenotype assignment, requiring significant manual review. While large language models (LLMs) hold promise for more context-driven phenotype extraction, they are prone to errors and "hallucinations," making them less reliable without further refinement. We present RAG-HPO, a Python-based tool that leverages retrieval-augmented generation (RAG) to elevate accuracy of HPO term assignment by LLM. This approach bypasses the limitations of baseline models and eliminates the need for time- and resource-intensive fine-tuning. RAG-HPO integrates a dynamic vector database, containing > 54,000 phenotypic phrases mapped to HPO IDs, which allows real-time retrieval and contextual matching. The RAG-HPO workflow begins by extracting phenotypic phrases from clinical text via an LLM and then matching them via semantic similarity to entries within the database. The best term matches are returned to the LLM as context for final HPO term assignment of each phrase.

Results: Performance was benchmarked on 112 published case reports with 1792 manually assigned HPO terms and compared to Doc2HPO, ClinPhen, and FastHPOCR. In evaluations, RAG-HPO + LLaMa-3.1 70B achieved a mean precision of 0.81, recall of 0.76, and an F1 score of 0.78-significantly surpassing conventional tools (p < 0.00001). RAG-HPO returned 1648 terms, of which 19.1% (315) were false positives that did not exactly match our manually annotated standard. Among these, < 1% (1/315) represented hallucinations, and 1.3% (4/315) represented terms with no ontological relationship to the desired target; the remaining false positives (95.2%, 300/315) were broader ancestor terms of the target term, which may still be relevant to users in many contexts.

Conclusions: RAG-HPO is a user-friendly, adaptable tool designed for secure evaluation of clinical text and outperforms standard HPO-matching tools in precision, recall, and F1. Its enhanced precision and recall represent a substantial advancement in phenotypic analysis, accelerating the identification of genetic mechanisms underlying rare diseases and driving progress in genetic research and clinical genomics. RAG-HPO is available at https://github.com/PoseyPod/RAG-HPO .

查看原文本刊更多论文

通过使用检索增强生成的大型语言模型改进自动深度表型。

背景：诊断罕见的遗传疾病依赖于精确的表型和基因型分析，人类表型本体论（HPO）为捕获临床表型提供了一种标准化的语言。基于规则的HPO提取工具使用概念识别来自动识别表型，但它们经常与不完整的表型分配作斗争，需要大量的人工审查。虽然大型语言模型（llm）有望提供更多的上下文驱动的表型提取，但它们容易出现错误和“幻觉”，如果不进一步改进，它们的可靠性就会降低。我们提出了一个基于python的工具RAG-HPO，它利用检索增强生成（RAG）来提高LLM对HPO术语分配的准确性。这种方法绕过了基线模型的限制，消除了对时间和资源密集的微调的需要。RAG-HPO集成了一个动态向量数据库，包含bb1054000个表型短语映射到HPO id，允许实时检索和上下文匹配。RAG-HPO工作流程首先通过LLM从临床文本中提取表型短语，然后通过语义相似性与数据库中的条目进行匹配。将最佳的术语匹配返回给LLM，作为每个短语的最终HPO术语分配的上下文。结果：对112份已发表的病例报告进行了性能基准测试，其中包含1792个手动分配的HPO术语，并与Doc2HPO， ClinPhen和FastHPOCR进行了比较。在评估中，RAG-HPO + LLaMa-3.1 70B的平均精密度为0.81，召回率为0.76，F1评分为0.78，显著超过传统工具(p)。结论：RAG-HPO是一种用户友好、适应性强的工具，设计用于临床文本的安全评估，在精密度、召回率和F1方面优于标准的hpo匹配工具。其精确度和召回率的提高代表了表型分析的重大进步，加速了罕见疾病遗传机制的识别，推动了遗传研究和临床基因组学的进步。RAG-HPO可在https://github.com/PoseyPod/RAG-HPO上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genome Medicine GENETICS & HEREDITY-

CiteScore

20.80

自引率

0.80%

发文量

128

审稿时长

6-12 weeks

期刊介绍： Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.