Yingshuai Wang , Yanli Wan , Xingyun Lei , Qingkun Chen , Hongpu Hu
{"title":"基于检索增强生成的大型语言模型医学知识理解与推理优化方法","authors":"Yingshuai Wang , Yanli Wan , Xingyun Lei , Qingkun Chen , Hongpu Hu","doi":"10.1016/j.array.2025.100504","DOIUrl":null,"url":null,"abstract":"<div><div>Based on the existing Retrieval Augmented Generation (RAG) technology, this study proposes innovative solution to better address the hallucination issues of current large language models. By optimizing data processing, prompt engineering, and multi-retriever fusion, it resolves issues such as semantic capture bias, inaccurate context retrieval, information redundancy, hallucination generation, and length limitations. Data processing focuses on text cleaning, disambiguation, and removing redundant information to enhance consistency. Prompt engineering aids the model in better understanding the task. The adaptive weight fusion of sparse and dense retrievers improves context retrieval accuracy. Experiments conducted on the CCKS-TCMBench dataset for medical knowledge understanding and semantic reasoning show that the optimized model significantly outperforms the baseline across all evaluation metrics.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100504"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A retrieval augmented generation based optimization approach for medical knowledge understanding and reasoning in large language models\",\"authors\":\"Yingshuai Wang , Yanli Wan , Xingyun Lei , Qingkun Chen , Hongpu Hu\",\"doi\":\"10.1016/j.array.2025.100504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Based on the existing Retrieval Augmented Generation (RAG) technology, this study proposes innovative solution to better address the hallucination issues of current large language models. By optimizing data processing, prompt engineering, and multi-retriever fusion, it resolves issues such as semantic capture bias, inaccurate context retrieval, information redundancy, hallucination generation, and length limitations. Data processing focuses on text cleaning, disambiguation, and removing redundant information to enhance consistency. Prompt engineering aids the model in better understanding the task. The adaptive weight fusion of sparse and dense retrievers improves context retrieval accuracy. Experiments conducted on the CCKS-TCMBench dataset for medical knowledge understanding and semantic reasoning show that the optimized model significantly outperforms the baseline across all evaluation metrics.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"28 \",\"pages\":\"Article 100504\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625001316\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A retrieval augmented generation based optimization approach for medical knowledge understanding and reasoning in large language models
Based on the existing Retrieval Augmented Generation (RAG) technology, this study proposes innovative solution to better address the hallucination issues of current large language models. By optimizing data processing, prompt engineering, and multi-retriever fusion, it resolves issues such as semantic capture bias, inaccurate context retrieval, information redundancy, hallucination generation, and length limitations. Data processing focuses on text cleaning, disambiguation, and removing redundant information to enhance consistency. Prompt engineering aids the model in better understanding the task. The adaptive weight fusion of sparse and dense retrievers improves context retrieval accuracy. Experiments conducted on the CCKS-TCMBench dataset for medical knowledge understanding and semantic reasoning show that the optimized model significantly outperforms the baseline across all evaluation metrics.