LmRaC：功能可扩展的 LLM 用户实验结果查询工具。

Bioinformatics (Oxford, England) Pub Date : 2024-11-15 DOI:10.1093/bioinformatics/btae679

Douglas B Craig, Sorin Drăghici

{"title":"LmRaC：功能可扩展的 LLM 用户实验结果查询工具。","authors":"Douglas B Craig, Sorin Drăghici","doi":"10.1093/bioinformatics/btae679","DOIUrl":null,"url":null,"abstract":"Motivation: Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.Results: Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).Availability and implementation: Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.\",\"authors\":\"Douglas B Craig, Sorin Drăghici\",\"doi\":\"10.1093/bioinformatics/btae679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.Results: Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).Availability and implementation: Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动机大型语言模型（LLMs）在众多领域都取得了令人瞩目的成果。然而，人们对幻觉和伪造权威来源的持续担忧，为其在科学研究中的全面应用提出了严重的问题。检索增强生成（RAG）是一种让 LLM 在推理任务中使用数据和文档的技术，这些数据和文档在训练过程中是不可用的。除了向 LLM 提供动态和定量数据外，RAG 还提供了仔细控制和追踪源材料的方法，从而确保结果的准确性、完整性和权威性：我们在此介绍 LmRaC，这是一种基于 LLM 的工具，能够根据用户自己的实验结果回答复杂的科学问题。LmRaC 允许用户从 PubMed 资源（RAGdom）中动态建立特定领域的知识库。答案完全来自 RAG，引文精确到段落级别，几乎消除了任何幻觉或捏造的可能性。然后，这些答案可用于构建实验上下文（RAGexp），连同用户提供的文档（如设计、协议）和定量结果，可用于回答有关用户特定实验的问题。有关定量实验数据的问题是 LmRaC 不可分割的一部分，由用户定义且功能可扩展的 REST API 服务器（RAGfun）提供支持：有关 LmRaC 的详细文档以及用于定义用户功能的 REST API 服务器示例，请访问 https://github.com/dbcraig/LmRaC。LmRaC 网络应用程序镜像可从 Docker Hub (https://hub.docker.com) 以 dbcraig/lmrac 的形式提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.

Motivation: Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.

Results: Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).

Availability and implementation: Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量