Leveraging long context in retrieval augmented language models for medical question answering

IF 12.4 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine Pub Date : 2025-05-02 DOI:10.1038/s41746-025-01651-w

Gongbo Zhang, Zihan Xu, Qiao Jin, Fangyi Chen, Yilu Fang, Yi Liu, Justin F. Rousseau, Ziyang Xu, Zhiyong Lu, Chunhua Weng, Yifan Peng

{"title":"Leveraging long context in retrieval augmented language models for medical question answering","authors":"Gongbo Zhang, Zihan Xu, Qiao Jin, Fangyi Chen, Yilu Fang, Yi Liu, Justin F. Rousseau, Ziyang Xu, Zhiyong Lu, Chunhua Weng, Yifan Peng","doi":"10.1038/s41746-025-01651-w","DOIUrl":null,"url":null,"abstract":"<p>While holding great promise for improving and facilitating healthcare through applications of medical literature summarization, large language models (LLMs) struggle to produce up-to-date responses on evolving topics due to outdated knowledge or hallucination. Retrieval-augmented generation (RAG) is a pivotal innovation that improves the accuracy and relevance of LLM responses by integrating LLMs with a search engine and external sources of knowledge. However, the quality of RAG responses can be largely impacted by the rank and density of key information in the retrieval results, such as the “lost-in-the-middle” problem. In this work, we aim to improve the robustness and reliability of the RAG workflow in the medical domain. Specifically, we propose a map-reduce strategy, BriefContext, to combat the “lost-in-the-middle” issue without modifying the model weights. We demonstrated the advantage of the workflow with various LLM backbones and on multiple QA datasets. This method promises to improve the safety and reliability of LLMs deployed in healthcare domains by reducing the risk of misinformation, ensuring critical clinical content is retained in generated responses, and enabling more trustworthy use of LLMs in critical tasks such as medical question answering, clinical decision support, and patient-facing applications.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"55 1","pages":""},"PeriodicalIF":12.4000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01651-w","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

While holding great promise for improving and facilitating healthcare through applications of medical literature summarization, large language models (LLMs) struggle to produce up-to-date responses on evolving topics due to outdated knowledge or hallucination. Retrieval-augmented generation (RAG) is a pivotal innovation that improves the accuracy and relevance of LLM responses by integrating LLMs with a search engine and external sources of knowledge. However, the quality of RAG responses can be largely impacted by the rank and density of key information in the retrieval results, such as the “lost-in-the-middle” problem. In this work, we aim to improve the robustness and reliability of the RAG workflow in the medical domain. Specifically, we propose a map-reduce strategy, BriefContext, to combat the “lost-in-the-middle” issue without modifying the model weights. We demonstrated the advantage of the workflow with various LLM backbones and on multiple QA datasets. This method promises to improve the safety and reliability of LLMs deployed in healthcare domains by reducing the risk of misinformation, ensuring critical clinical content is retained in generated responses, and enabling more trustworthy use of LLMs in critical tasks such as medical question answering, clinical decision support, and patient-facing applications.

Abstract Image

查看原文本刊更多论文

在医学问题回答的检索增强语言模型中利用长上下文

虽然通过医学文献摘要的应用有望改善和促进医疗保健，但由于过时的知识或幻觉，大型语言模型（llm）难以对不断发展的主题产生最新的响应。检索增强生成（RAG）是一项关键创新，通过将法学硕士与搜索引擎和外部知识来源集成，提高了法学硕士响应的准确性和相关性。然而，RAG响应的质量在很大程度上受到检索结果中关键信息的秩和密度的影响，例如“中间丢失”问题。在这项工作中，我们的目标是提高医学领域RAG工作流的鲁棒性和可靠性。具体来说，我们提出了一种地图减少策略，即BriefContext，在不修改模型权重的情况下解决“中间丢失”问题。我们展示了使用各种LLM主干和多个QA数据集的工作流的优势。通过降低错误信息的风险，确保在生成的响应中保留关键临床内容，以及在医疗问题回答、临床决策支持和面向患者的应用程序等关键任务中更可靠地使用llm，该方法有望提高部署在医疗保健领域的llm的安全性和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

NPJ Digital Medicine Multiple-

CiteScore

25.10

自引率

3.30%

发文量

170

审稿时长

15 weeks

期刊介绍： npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics. The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.