面向生物医学研究和临床支持的深度思维法学硕士检索增强知识挖掘方法。

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Yichun Feng, Jiawei Wang, Ruikun He, Lu Zhou, Yixue Li
{"title":"面向生物医学研究和临床支持的深度思维法学硕士检索增强知识挖掘方法。","authors":"Yichun Feng, Jiawei Wang, Ruikun He, Lu Zhou, Yixue Li","doi":"10.1093/gigascience/giaf109","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Knowledge graphs and large language models (LLMs) are key tools for biomedical knowledge integration and reasoning, facilitating structured organization of scientific articles and discovery of complex semantic relationships. However, current methods face challenges: knowledge graph construction is limited by complex terminology, data heterogeneity, and rapid knowledge evolution, while LLMs show limitations in retrieval and reasoning, making it difficult to uncover cross-document associations and reasoning pathways.</p><p><strong>Results: </strong>We propose a pipeline that uses LLMs to construct a Biomedical Stratified Knowledge Graph (BioStrataKG) from large-scale articles and builds the Biomedical Cross-Document Question Answering Dataset (BioCDQA) to evaluate latent knowledge retrieval and multihop reasoning. We then introduce Integrated and Progressive Retrieval-Augmented Reasoning (IP-RAR) to enhance retrieval accuracy and knowledge reasoning. IP-RAR maximizes information recall through integrated reasoning-based retrieval and refines knowledge via progressive reasoning-based generation, using self-reflection to achieve deep thinking and precise contextual understanding. Experiments show that IP-RAR improves document retrieval F1 score by 20% and answer generation accuracy by 25% over existing methods.</p><p><strong>Conclusions: </strong>The IP-RAR helps doctors efficiently integrate treatment evidence to inform the development of personalized medication plans and enables researchers to analyze advancements and research gaps, accelerating the hypothesis generation phase of scientific discovery and decision-making.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448786/pdf/","citationCount":"0","resultStr":"{\"title\":\"A retrieval-augmented knowledge mining method with deep thinking LLMs for biomedical research and clinical support.\",\"authors\":\"Yichun Feng, Jiawei Wang, Ruikun He, Lu Zhou, Yixue Li\",\"doi\":\"10.1093/gigascience/giaf109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Knowledge graphs and large language models (LLMs) are key tools for biomedical knowledge integration and reasoning, facilitating structured organization of scientific articles and discovery of complex semantic relationships. However, current methods face challenges: knowledge graph construction is limited by complex terminology, data heterogeneity, and rapid knowledge evolution, while LLMs show limitations in retrieval and reasoning, making it difficult to uncover cross-document associations and reasoning pathways.</p><p><strong>Results: </strong>We propose a pipeline that uses LLMs to construct a Biomedical Stratified Knowledge Graph (BioStrataKG) from large-scale articles and builds the Biomedical Cross-Document Question Answering Dataset (BioCDQA) to evaluate latent knowledge retrieval and multihop reasoning. We then introduce Integrated and Progressive Retrieval-Augmented Reasoning (IP-RAR) to enhance retrieval accuracy and knowledge reasoning. IP-RAR maximizes information recall through integrated reasoning-based retrieval and refines knowledge via progressive reasoning-based generation, using self-reflection to achieve deep thinking and precise contextual understanding. Experiments show that IP-RAR improves document retrieval F1 score by 20% and answer generation accuracy by 25% over existing methods.</p><p><strong>Conclusions: </strong>The IP-RAR helps doctors efficiently integrate treatment evidence to inform the development of personalized medication plans and enables researchers to analyze advancements and research gaps, accelerating the hypothesis generation phase of scientific discovery and decision-making.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448786/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giaf109\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf109","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:知识图谱和大型语言模型(llm)是生物医学知识整合和推理的关键工具,有助于科学文章的结构化组织和复杂语义关系的发现。然而,目前的方法面临着挑战:知识图谱的构建受到复杂术语、数据异质性和知识快速演变的限制,而法学硕士在检索和推理方面存在局限性,难以发现跨文档关联和推理途径。结果:我们提出了一个管道,使用llm从大规模文章中构建生物医学分层知识图(BioStrataKG),并构建生物医学跨文档问答数据集(BioCDQA)来评估潜在知识检索和多跳推理。然后,我们引入了集成和渐进式检索增强推理(IP-RAR)来提高检索精度和知识推理。IP-RAR通过基于推理的综合检索实现信息回忆最大化,通过基于推理的递进生成实现知识提炼,通过自我反思实现深度思考和精确语境理解。实验表明,与现有方法相比,IP-RAR将文档检索F1分数提高了20%,答案生成准确率提高了25%。结论:IP-RAR帮助医生有效整合治疗证据,为个性化用药计划的制定提供信息,使研究人员能够分析进展和研究差距,加快科学发现和决策的假设生成阶段。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A retrieval-augmented knowledge mining method with deep thinking LLMs for biomedical research and clinical support.

Background: Knowledge graphs and large language models (LLMs) are key tools for biomedical knowledge integration and reasoning, facilitating structured organization of scientific articles and discovery of complex semantic relationships. However, current methods face challenges: knowledge graph construction is limited by complex terminology, data heterogeneity, and rapid knowledge evolution, while LLMs show limitations in retrieval and reasoning, making it difficult to uncover cross-document associations and reasoning pathways.

Results: We propose a pipeline that uses LLMs to construct a Biomedical Stratified Knowledge Graph (BioStrataKG) from large-scale articles and builds the Biomedical Cross-Document Question Answering Dataset (BioCDQA) to evaluate latent knowledge retrieval and multihop reasoning. We then introduce Integrated and Progressive Retrieval-Augmented Reasoning (IP-RAR) to enhance retrieval accuracy and knowledge reasoning. IP-RAR maximizes information recall through integrated reasoning-based retrieval and refines knowledge via progressive reasoning-based generation, using self-reflection to achieve deep thinking and precise contextual understanding. Experiments show that IP-RAR improves document retrieval F1 score by 20% and answer generation accuracy by 25% over existing methods.

Conclusions: The IP-RAR helps doctors efficiently integrate treatment evidence to inform the development of personalized medication plans and enables researchers to analyze advancements and research gaps, accelerating the hypothesis generation phase of scientific discovery and decision-making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信