VAIV bio-discovery service using transformer model and retrieval augmented generation.

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Seonho Kim, Juntae Yoon
{"title":"VAIV bio-discovery service using transformer model and retrieval augmented generation.","authors":"Seonho Kim, Juntae Yoon","doi":"10.1186/s12859-024-05903-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery.</p><p><strong>Main body: </strong>We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25.</p><p><strong>Conclusion: </strong>As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340140/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05903-6","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery.

Main body: We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25.

Conclusion: As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.

使用变压器模型和检索增强生成的 VAIV 生物发现服务。
背景:LLM 和机器学习等人工智能技术在支持生物医学知识发现方面取得了长足的进步:我们提出了一种名为 "VAIV 生物发现 "的新型生物医学神经搜索服务,它支持在 PubMed 等非结构化文本中增强知识发现和文档搜索。它主要处理与化合物/药物、基因/蛋白质、疾病及其相互作用(化合物/药物-蛋白质/基因,包括药物-靶点、药物-药物和药物-疾病)相关的信息。为了提供全面的知识,该系统提供了四种搜索选项:基本搜索、实体和交互搜索以及自然语言搜索。我们采用了 T5slim_dec,它通过去除解码器块中的自注意层,将 T5(文本到文本转换器)的自回归生成任务调整为交互作用提取任务。它还通过检索增强生成(RAG)对给定自然语言查询的检索结果进行总结,从而协助解释研究成果。该搜索引擎采用了神经搜索与概率搜索相结合的混合方法 BM25:因此,我们的系统可以更好地理解文档中术语的上下文、语义和关系,从而提高搜索的准确性。这项研究为快速发展的生物医学领域做出了贡献,为获取和发现相关知识提供了一种新的服务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信