PlantGPT: An Arabidopsis-Based Intelligent Agent that Answers Questions about Plant Functional Genomics.

IF 14.3 1区 材料科学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Ruixiang Zhang, Yu Wang, Weiyang Yang, Jun Wen, Weizhi Liu, Shipeng Zhi, Guangzhou Li, Nan Chai, Jiaqi Huang, Yongyao Xie, Xianrong Xie, Letian Chen, Miao Gu, Yao-Guang Liu, Qinlong Zhu
{"title":"PlantGPT: An Arabidopsis-Based Intelligent Agent that Answers Questions about Plant Functional Genomics.","authors":"Ruixiang Zhang, Yu Wang, Weiyang Yang, Jun Wen, Weizhi Liu, Shipeng Zhi, Guangzhou Li, Nan Chai, Jiaqi Huang, Yongyao Xie, Xianrong Xie, Letian Chen, Miao Gu, Yao-Guang Liu, Qinlong Zhu","doi":"10.1002/advs.202503926","DOIUrl":null,"url":null,"abstract":"<p><p>Research into plant gene function is crucial for developing strategies to increase crop yields. The recent introduction of large language models (LLMs) offers a means to aggregate large amounts of data into a queryable format, but the output can contain inaccurate or false claims known as hallucinations. To minimize such hallucinations and produce high-quality knowledge-based outputs, the abstracts of over 60 000 plant research articles are compiled into a Chroma database for retrieval-augmented generation (RAG). Then linguistic data are used from 13 993 Arabidopsis (Arabidopsis thaliana) phenotypes and 23 323 gene functions to fine-tune the LLM Llama3-8B, producing PlantGPT, a virtual expert in Arabidopsis phenotype-gene research. By evaluating answers to test questions, it is demonstrated that PlantGPT outperforms general LLMs in answering specialized questions. The findings provide a blueprint for functional genomics research in food crops and demonstrate the potential for developing LLMs for plant research modalities. To provide broader access and facilitate adoption, the online tool http://www.plantgpt.icu is developed, which will allow researchers to use PlantGPT in their scientific investigations.</p>","PeriodicalId":117,"journal":{"name":"Advanced Science","volume":" ","pages":"e03926"},"PeriodicalIF":14.3000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Science","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1002/advs.202503926","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Research into plant gene function is crucial for developing strategies to increase crop yields. The recent introduction of large language models (LLMs) offers a means to aggregate large amounts of data into a queryable format, but the output can contain inaccurate or false claims known as hallucinations. To minimize such hallucinations and produce high-quality knowledge-based outputs, the abstracts of over 60 000 plant research articles are compiled into a Chroma database for retrieval-augmented generation (RAG). Then linguistic data are used from 13 993 Arabidopsis (Arabidopsis thaliana) phenotypes and 23 323 gene functions to fine-tune the LLM Llama3-8B, producing PlantGPT, a virtual expert in Arabidopsis phenotype-gene research. By evaluating answers to test questions, it is demonstrated that PlantGPT outperforms general LLMs in answering specialized questions. The findings provide a blueprint for functional genomics research in food crops and demonstrate the potential for developing LLMs for plant research modalities. To provide broader access and facilitate adoption, the online tool http://www.plantgpt.icu is developed, which will allow researchers to use PlantGPT in their scientific investigations.

PlantGPT:一个基于拟南芥的智能代理,可以回答关于植物功能基因组学的问题。
研究植物基因功能对于制定提高作物产量的策略至关重要。最近引入的大型语言模型(llm)提供了一种将大量数据聚合为可查询格式的方法,但是输出可能包含不准确或错误的声明,即所谓的幻觉。为了尽量减少这种幻觉并产生高质量的基于知识的输出,我们将超过6万篇植物研究文章的摘要汇编到一个用于检索增强生成(RAG)的色度数据库中。然后利用拟南芥(Arabidopsis thaliana) 13 993种表型和23 323种基因功能的语言数据对LLM Llama3-8B进行微调,生成拟南芥表型-基因研究的虚拟专家PlantGPT。通过评估测试问题的答案,证明PlantGPT在回答专业问题方面优于一般法学硕士。这些发现为粮食作物的功能基因组学研究提供了蓝图,并展示了为植物研究模式开发llm的潜力。为了提供更广泛的访问和促进采用,开发了在线工具http://www.plantgpt.icu,这将使研究人员能够在他们的科学研究中使用PlantGPT。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Advanced Science
Advanced Science CHEMISTRY, MULTIDISCIPLINARYNANOSCIENCE &-NANOSCIENCE & NANOTECHNOLOGY
CiteScore
18.90
自引率
2.60%
发文量
1602
审稿时长
1.9 months
期刊介绍: Advanced Science is a prestigious open access journal that focuses on interdisciplinary research in materials science, physics, chemistry, medical and life sciences, and engineering. The journal aims to promote cutting-edge research by employing a rigorous and impartial review process. It is committed to presenting research articles with the highest quality production standards, ensuring maximum accessibility of top scientific findings. With its vibrant and innovative publication platform, Advanced Science seeks to revolutionize the dissemination and organization of scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信