VMGP：一个基于统一变分自编码器的多任务模型，用于植物的多表型、多环境和跨群体基因组选择

IF 12.4 Q1 AGRICULTURE, MULTIDISCIPLINARY

Artificial Intelligence in Agriculture Pub Date : 2025-06-24 DOI:10.1016/j.aiia.2025.06.007

Xiangyu Zhao , Fuzhen Sun , Jinlong Li , Dongfeng Zhang , Qiusi Zhang , Zhongqiang Liu , Changwei Tan , Hongxiang Ma , Kaiyi Wang

{"title":"VMGP：一个基于统一变分自编码器的多任务模型，用于植物的多表型、多环境和跨群体基因组选择","authors":"Xiangyu Zhao , Fuzhen Sun , Jinlong Li , Dongfeng Zhang , Qiusi Zhang , Zhongqiang Liu , Changwei Tan , Hongxiang Ma , Kaiyi Wang","doi":"10.1016/j.aiia.2025.06.007","DOIUrl":null,"url":null,"abstract":"<div><div>Plant breeding stands as a cornerstone for agricultural productivity and the safeguarding of food security. The advent of Genomic Selection heralds a new epoch in breeding, characterized by its capacity to harness whole-genome variation for genomic prediction. This approach transcends the need for prior knowledge of genes associated with specific traits. Nonetheless, the vast dimensionality of genomic data juxtaposed with the relatively limited number of phenotypic samples often leads to the “curse of dimensionality”, where traditional statistical, machine learning, and deep learning methods are prone to overfitting and suboptimal predictive performance. To surmount this challenge, we introduce a unified Variational auto-encoder based Multi-task Genomic Prediction model (VMGP) that integrates self-supervised genomic compression and reconstruction with multiple prediction tasks. This approach provides a robust solution, offering a formidable predictive framework that has been rigorously validated across public datasets for wheat, rice, and maize. Our model demonstrates exceptional capabilities in multi-phenotype and multi-environment genomic prediction, successfully navigating the complexities of cross-population genomic selection and underscoring its unique strengths and utility. Furthermore, by integrating VMGP with model interpretability, we can effectively triage relevant single nucleotide polymorphisms, thereby enhancing prediction performance and proposing potential cost-effective genotyping solutions. The VMGP framework, with its simplicity, stable predictive prowess, and open-source code, is exceptionally well-suited for broad dissemination within plant breeding programs. It is particularly advantageous for breeders who prioritize phenotype prediction yet may not possess extensive knowledge in deep learning or proficiency in parameter tuning.</div></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"15 4","pages":"Pages 829-842"},"PeriodicalIF":12.4000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VMGP: A unified variational auto-encoder based multi-task model for multi-phenotype, multi-environment, and cross-population genomic selection in plants\",\"authors\":\"Xiangyu Zhao , Fuzhen Sun , Jinlong Li , Dongfeng Zhang , Qiusi Zhang , Zhongqiang Liu , Changwei Tan , Hongxiang Ma , Kaiyi Wang\",\"doi\":\"10.1016/j.aiia.2025.06.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Plant breeding stands as a cornerstone for agricultural productivity and the safeguarding of food security. The advent of Genomic Selection heralds a new epoch in breeding, characterized by its capacity to harness whole-genome variation for genomic prediction. This approach transcends the need for prior knowledge of genes associated with specific traits. Nonetheless, the vast dimensionality of genomic data juxtaposed with the relatively limited number of phenotypic samples often leads to the “curse of dimensionality”, where traditional statistical, machine learning, and deep learning methods are prone to overfitting and suboptimal predictive performance. To surmount this challenge, we introduce a unified Variational auto-encoder based Multi-task Genomic Prediction model (VMGP) that integrates self-supervised genomic compression and reconstruction with multiple prediction tasks. This approach provides a robust solution, offering a formidable predictive framework that has been rigorously validated across public datasets for wheat, rice, and maize. Our model demonstrates exceptional capabilities in multi-phenotype and multi-environment genomic prediction, successfully navigating the complexities of cross-population genomic selection and underscoring its unique strengths and utility. Furthermore, by integrating VMGP with model interpretability, we can effectively triage relevant single nucleotide polymorphisms, thereby enhancing prediction performance and proposing potential cost-effective genotyping solutions. The VMGP framework, with its simplicity, stable predictive prowess, and open-source code, is exceptionally well-suited for broad dissemination within plant breeding programs. It is particularly advantageous for breeders who prioritize phenotype prediction yet may not possess extensive knowledge in deep learning or proficiency in parameter tuning.</div></div>\",\"PeriodicalId\":52814,\"journal\":{\"name\":\"Artificial Intelligence in Agriculture\",\"volume\":\"15 4\",\"pages\":\"Pages 829-842\"},\"PeriodicalIF\":12.4000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Agriculture\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589721725000704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721725000704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

植物育种是农业生产力和保障粮食安全的基石。基因组选择的出现预示着育种的新时代，其特点是能够利用全基因组变异进行基因组预测。这种方法超越了对与特定性状相关的基因的先验知识的需要。尽管如此，庞大的基因组数据维度与相对有限的表型样本数量并置于一起，往往导致“维度诅咒”，传统的统计、机器学习和深度学习方法容易出现过拟合和次优预测性能。为了克服这一挑战，我们引入了一个统一的基于变分自编码器的多任务基因组预测模型（VMGP），该模型将自监督基因组压缩和重建与多个预测任务集成在一起。这种方法提供了一个强大的解决方案，提供了一个强大的预测框架，该框架已在小麦、水稻和玉米的公共数据集中得到严格验证。我们的模型展示了在多表型和多环境基因组预测方面的卓越能力，成功地驾驭了跨种群基因组选择的复杂性，并强调了其独特的优势和实用性。此外，通过将VMGP与模型可解释性相结合，我们可以有效地分类相关的单核苷酸多态性，从而提高预测性能并提出潜在的经济有效的基因分型解决方案。VMGP框架具有简单、稳定的预测能力和开源代码，非常适合在植物育种项目中广泛传播。这是特别有利的育种者优先考虑表型预测，但可能不具备广泛的知识，在深度学习或熟练掌握参数调整。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VMGP: A unified variational auto-encoder based multi-task model for multi-phenotype, multi-environment, and cross-population genomic selection in plants

Plant breeding stands as a cornerstone for agricultural productivity and the safeguarding of food security. The advent of Genomic Selection heralds a new epoch in breeding, characterized by its capacity to harness whole-genome variation for genomic prediction. This approach transcends the need for prior knowledge of genes associated with specific traits. Nonetheless, the vast dimensionality of genomic data juxtaposed with the relatively limited number of phenotypic samples often leads to the “curse of dimensionality”, where traditional statistical, machine learning, and deep learning methods are prone to overfitting and suboptimal predictive performance. To surmount this challenge, we introduce a unified Variational auto-encoder based Multi-task Genomic Prediction model (VMGP) that integrates self-supervised genomic compression and reconstruction with multiple prediction tasks. This approach provides a robust solution, offering a formidable predictive framework that has been rigorously validated across public datasets for wheat, rice, and maize. Our model demonstrates exceptional capabilities in multi-phenotype and multi-environment genomic prediction, successfully navigating the complexities of cross-population genomic selection and underscoring its unique strengths and utility. Furthermore, by integrating VMGP with model interpretability, we can effectively triage relevant single nucleotide polymorphisms, thereby enhancing prediction performance and proposing potential cost-effective genotyping solutions. The VMGP framework, with its simplicity, stable predictive prowess, and open-source code, is exceptionally well-suited for broad dissemination within plant breeding programs. It is particularly advantageous for breeders who prioritize phenotype prediction yet may not possess extensive knowledge in deep learning or proficiency in parameter tuning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence in Agriculture Engineering-Engineering (miscellaneous)

CiteScore

21.60

自引率

0.00%

发文量

审稿时长

12 weeks