Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions.

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Pub Date : 2025-01-01 DOI:10.1142/9789819807024_0015

Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang

{"title":"Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions.","authors":"Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang","doi":"10.1142/9789819807024_0015","DOIUrl":null,"url":null,"abstract":"<p><p>The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a vanilla RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with vanilla RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"199-214"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11997844/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789819807024_0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a vanilla RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with vanilla RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG.

Abstract Image

查看原文本刊更多论文

用迭代追问改进医学中的检索增强生成。

大型语言模型（llm）的涌现能力在解决医学问题方面显示出巨大的潜力。他们可能拥有丰富的医学知识，但仍可能产生幻觉，在知识更新方面缺乏灵活性。虽然已经提出了检索增强生成（RAG）来增强具有外部知识库的法学硕士的医学问答能力，但在需要多轮信息搜索的复杂情况下，它仍然可能失败。为了解决这一问题，我们提出了针对医学的迭代RAG (i-MedRAG)，法学硕士可以根据之前的信息搜索尝试迭代地提出后续查询。在i-MedRAG的每次迭代中，后续的查询将由一个普通的RAG系统回答，它们将进一步用于指导下一次迭代中的查询生成。我们的实验表明，与vanilla RAG相比，i-MedRAG带来的各种llm在来自美国医学许可考试（USMLE）临床小视频的复杂问题以及大规模多任务语言理解（MMLU）数据集中的各种知识测试上的性能有所提高。值得注意的是，我们的零射击i-MedRAG优于GPT-3.5上所有现有的提示工程和微调方法，在MedQA数据集上实现了69.68%的准确率。此外，我们用后续查询的不同迭代和每次迭代的不同查询数量来表征i-MedRAG的缩放属性。我们的案例研究表明，i-MedRAG可以灵活地提出后续查询，形成推理链，对医疗问题进行深度分析。据我们所知，这是第一次将后续查询纳入医疗RAG的此类研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Medicine-Medicine (all)

CiteScore

4.50

自引率

0.00%

发文量