Dongfang Zhang, Haoze Du, Xiaolei Wang, Mingdong Zhu, Xiaoxiao Pang, Dongqing Wei, Xianfang Wang
{"title":"CMedRAGBot: A Chinese Medical Chatbot Based on Graph RAG and Large Language Models.","authors":"Dongfang Zhang, Haoze Du, Xiaolei Wang, Mingdong Zhu, Xiaoxiao Pang, Dongqing Wei, Xianfang Wang","doi":"10.1007/s12539-025-00715-5","DOIUrl":null,"url":null,"abstract":"<p><p>In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00715-5","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .
期刊介绍:
Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology.
The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer.
The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.