SeedLLM·Rice: A large language model integrated with rice biological knowledge graph.

IF 24.1 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Molecular Plant Pub Date : 2025-07-07 Epub Date: 2025-05-28 DOI:10.1016/j.molp.2025.05.013
Fan Yang, Huanjun Kong, Jie Ying, Zihong Chen, Tao Luo, Wanli Jiang, Zhonghang Yuan, Zhefan Wang, Zhaona Ma, Shikuan Wang, Wanfeng Ma, Xiaoyi Wang, Xiaoying Li, Zhengyin Hu, Xiaodong Ma, Minguo Liu, Xiqing Wang, Fan Chen, Nanqing Dong
{"title":"SeedLLM·Rice: A large language model integrated with rice biological knowledge graph.","authors":"Fan Yang, Huanjun Kong, Jie Ying, Zihong Chen, Tao Luo, Wanli Jiang, Zhonghang Yuan, Zhefan Wang, Zhaona Ma, Shikuan Wang, Wanfeng Ma, Xiaoyi Wang, Xiaoying Li, Zhengyin Hu, Xiaodong Ma, Minguo Liu, Xiqing Wang, Fan Chen, Nanqing Dong","doi":"10.1016/j.molp.2025.05.013","DOIUrl":null,"url":null,"abstract":"<p><p>Rice biology research involves complex decision-making, requiring researchers to navigate a rapidly expanding body of knowledge encompassing extensive literature and multiomics data. The exponential increase in biological data and scientific publications presents significant challenges for efficiently extracting meaningful insights. Although large language models (LLMs) show promise for knowledge retrieval, their application to rice-specific research has been limited by the absence of specialized models and the challenge of synthesizing multimodal data integral to the field. Moreover, the lack of standardized evaluation frameworks for domain-specific tasks impedes the effective assessment of model performance. To address these challenges, we introduce SeedLLM·Rice (SeedLLM), a 7-billion-parameter model trained on 1.4 million rice-related publications, representing nearly 98.24% of global rice research output. Additionally, we present a novel human-centric evaluation framework designed to assess LLM performance in rice biology tasks. Initial evaluations demonstrate that SeedLLM outperforms general-purpose models, including OpenAI GPT-4o1 and DeepSeek-R1, achieving win rates of 57% to 88% on rice-specific tasks. Furthermore, SeedLLM is integrated with the Rice Biological Knowledge Graph (RBKG), which consolidates genome annotations for Nipponbare and large-scale synthesis of transcriptomic and proteomic information from over 1800 studies. This integration enhances the ability of SeedLLM to address complex research questions requiring the fusion of textual and multiomics data. To facilitate global collaboration, we provide free access to SeedLLM and the RBKG via an interactive web portal (https://seedllm.org.cn/). SeedLLM represents a transformative tool for rice biology research, enabling unprecedented discoveries in crop improvement and climate adaptation through advanced reasoning and comprehensive data integration.</p>","PeriodicalId":19012,"journal":{"name":"Molecular Plant","volume":" ","pages":"1118-1129"},"PeriodicalIF":24.1000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Plant","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.molp.2025.05.013","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Rice biology research involves complex decision-making, requiring researchers to navigate a rapidly expanding body of knowledge encompassing extensive literature and multiomics data. The exponential increase in biological data and scientific publications presents significant challenges for efficiently extracting meaningful insights. Although large language models (LLMs) show promise for knowledge retrieval, their application to rice-specific research has been limited by the absence of specialized models and the challenge of synthesizing multimodal data integral to the field. Moreover, the lack of standardized evaluation frameworks for domain-specific tasks impedes the effective assessment of model performance. To address these challenges, we introduce SeedLLM·Rice (SeedLLM), a 7-billion-parameter model trained on 1.4 million rice-related publications, representing nearly 98.24% of global rice research output. Additionally, we present a novel human-centric evaluation framework designed to assess LLM performance in rice biology tasks. Initial evaluations demonstrate that SeedLLM outperforms general-purpose models, including OpenAI GPT-4o1 and DeepSeek-R1, achieving win rates of 57% to 88% on rice-specific tasks. Furthermore, SeedLLM is integrated with the Rice Biological Knowledge Graph (RBKG), which consolidates genome annotations for Nipponbare and large-scale synthesis of transcriptomic and proteomic information from over 1800 studies. This integration enhances the ability of SeedLLM to address complex research questions requiring the fusion of textual and multiomics data. To facilitate global collaboration, we provide free access to SeedLLM and the RBKG via an interactive web portal (https://seedllm.org.cn/). SeedLLM represents a transformative tool for rice biology research, enabling unprecedented discoveries in crop improvement and climate adaptation through advanced reasoning and comprehensive data integration.

SeedLLM·Rice:集成了水稻生物知识图谱的大型语言模型。
水稻生物学研究涉及复杂的决策,需要研究人员驾驭包括大量文献和多组学数据在内的庞大且不断增长的知识体系。生物数据和科学出版物的指数增长对有效提取有意义的见解提出了重大挑战。虽然大型语言模型(llm)显示出知识检索的前景,但由于缺乏专门的模型和综合多模态数据集成到该领域的挑战,它们在水稻特定研究中的应用受到阻碍。此外,缺乏针对特定领域任务的标准化评估框架阻碍了对该领域模型性能的评估。为了应对这些挑战,我们引入了SeedLLM·Rice (SeedLLM),这是一个70亿个参数的模型,使用140万份与水稻相关的出版物进行训练,这些出版物占全球水稻研究的98.24%。此外,我们提出了一个新的人类评估框架,旨在评估LLM在水稻生物学任务中的表现。对水稻特定任务的初步评估表明,SeedLLM优于OpenAI gpt - 401和DeepSeek-R1等通用模型,胜率在57%至88%之间。此外,SeedLLM还集成了水稻生物知识图谱(RBKG),该图谱整合了Nipponbare的基因组注释以及来自1800多项研究的转录组学和蛋白质组学信息的大规模合成。这种集成增强了SeedLLM解决需要融合文本和多组学数据的复杂研究问题的能力。为了促进全球合作,我们通过交互式门户网站(https://seedllm.org.cn/)免费提供SeedLLM和RBKG。SeedLLM代表了水稻生物学研究的变革性工具,通过其先进的推理能力和全面的数据集成,促进了作物改良和气候适应方面前所未有的发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Plant
Molecular Plant 植物科学-生化与分子生物学
CiteScore
37.60
自引率
2.20%
发文量
1784
审稿时长
1 months
期刊介绍: Molecular Plant is dedicated to serving the plant science community by publishing novel and exciting findings with high significance in plant biology. The journal focuses broadly on cellular biology, physiology, biochemistry, molecular biology, genetics, development, plant-microbe interaction, genomics, bioinformatics, and molecular evolution. Molecular Plant publishes original research articles, reviews, Correspondence, and Spotlights on the most important developments in plant biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信