{"title":"PlantGPT: An Arabidopsis-Based Intelligent Agent that Answers Questions about Plant Functional Genomics.","authors":"Ruixiang Zhang, Yu Wang, Weiyang Yang, Jun Wen, Weizhi Liu, Shipeng Zhi, Guangzhou Li, Nan Chai, Jiaqi Huang, Yongyao Xie, Xianrong Xie, Letian Chen, Miao Gu, Yao-Guang Liu, Qinlong Zhu","doi":"10.1002/advs.202503926","DOIUrl":null,"url":null,"abstract":"<p><p>Research into plant gene function is crucial for developing strategies to increase crop yields. The recent introduction of large language models (LLMs) offers a means to aggregate large amounts of data into a queryable format, but the output can contain inaccurate or false claims known as hallucinations. To minimize such hallucinations and produce high-quality knowledge-based outputs, the abstracts of over 60 000 plant research articles are compiled into a Chroma database for retrieval-augmented generation (RAG). Then linguistic data are used from 13 993 Arabidopsis (Arabidopsis thaliana) phenotypes and 23 323 gene functions to fine-tune the LLM Llama3-8B, producing PlantGPT, a virtual expert in Arabidopsis phenotype-gene research. By evaluating answers to test questions, it is demonstrated that PlantGPT outperforms general LLMs in answering specialized questions. The findings provide a blueprint for functional genomics research in food crops and demonstrate the potential for developing LLMs for plant research modalities. To provide broader access and facilitate adoption, the online tool http://www.plantgpt.icu is developed, which will allow researchers to use PlantGPT in their scientific investigations.</p>","PeriodicalId":117,"journal":{"name":"Advanced Science","volume":" ","pages":"e03926"},"PeriodicalIF":14.3000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Science","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1002/advs.202503926","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Research into plant gene function is crucial for developing strategies to increase crop yields. The recent introduction of large language models (LLMs) offers a means to aggregate large amounts of data into a queryable format, but the output can contain inaccurate or false claims known as hallucinations. To minimize such hallucinations and produce high-quality knowledge-based outputs, the abstracts of over 60 000 plant research articles are compiled into a Chroma database for retrieval-augmented generation (RAG). Then linguistic data are used from 13 993 Arabidopsis (Arabidopsis thaliana) phenotypes and 23 323 gene functions to fine-tune the LLM Llama3-8B, producing PlantGPT, a virtual expert in Arabidopsis phenotype-gene research. By evaluating answers to test questions, it is demonstrated that PlantGPT outperforms general LLMs in answering specialized questions. The findings provide a blueprint for functional genomics research in food crops and demonstrate the potential for developing LLMs for plant research modalities. To provide broader access and facilitate adoption, the online tool http://www.plantgpt.icu is developed, which will allow researchers to use PlantGPT in their scientific investigations.
期刊介绍:
Advanced Science is a prestigious open access journal that focuses on interdisciplinary research in materials science, physics, chemistry, medical and life sciences, and engineering. The journal aims to promote cutting-edge research by employing a rigorous and impartial review process. It is committed to presenting research articles with the highest quality production standards, ensuring maximum accessibility of top scientific findings. With its vibrant and innovative publication platform, Advanced Science seeks to revolutionize the dissemination and organization of scientific knowledge.