GxENet: Novel fully connected neural network based approaches to incorporate GxE for predicting wheat yield

IF 8.2 Q1 AGRICULTURE, MULTIDISCIPLINARY
Sheikh Jubair , Olivier Tremblay-Savard , Mike Domaratzki
{"title":"GxENet: Novel fully connected neural network based approaches to incorporate GxE for predicting wheat yield","authors":"Sheikh Jubair ,&nbsp;Olivier Tremblay-Savard ,&nbsp;Mike Domaratzki","doi":"10.1016/j.aiia.2023.05.001","DOIUrl":null,"url":null,"abstract":"<div><p>The expression of quantitative traits of a line of a crop depends on its genetics, the environment where it is sown and the interaction between the genetic information and the environment known as GxE. Thus to maximize food production, new varieties are developed by selecting superior lines of seeds suitable for a specific environment. Genomic selection is a computational technique for developing a new variety that uses whole genome molecular markers to identify top lines of a crop. A large number of statistical and machine learning models are employed for single environment trials, where it is assumed that the environment does not have any effect on the quantitative traits. However, it is essential to consider both genomic and environmental data to develop a new variety, as these strong assumptions may lead to failing to select top lines for an environment. Here we devised three novel deep learning frameworks incorporating GxE within the deep learning model and predicted line-specific yield for an environment. In the process, we also developed a new technique for identifying environment-specific markers that can be useful in many applications of environment-specific genomic selection. The result demonstrates that our best framework obtains 1.75 to 1.95 times better correlation coefficients than other deep learning models that incorporate environmental data depending on the test scenario. Furthermore, the feature importance analysis shows that environmental information, followed by genomic information, is the driving factor in predicting environment-specific yield for a line. We also demonstrate a way to extend our framework for new data types, such as text or soil data. The extended model also shows the potential to be useful in genomic selection.</p></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"8 ","pages":"Pages 60-76"},"PeriodicalIF":8.2000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721723000168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The expression of quantitative traits of a line of a crop depends on its genetics, the environment where it is sown and the interaction between the genetic information and the environment known as GxE. Thus to maximize food production, new varieties are developed by selecting superior lines of seeds suitable for a specific environment. Genomic selection is a computational technique for developing a new variety that uses whole genome molecular markers to identify top lines of a crop. A large number of statistical and machine learning models are employed for single environment trials, where it is assumed that the environment does not have any effect on the quantitative traits. However, it is essential to consider both genomic and environmental data to develop a new variety, as these strong assumptions may lead to failing to select top lines for an environment. Here we devised three novel deep learning frameworks incorporating GxE within the deep learning model and predicted line-specific yield for an environment. In the process, we also developed a new technique for identifying environment-specific markers that can be useful in many applications of environment-specific genomic selection. The result demonstrates that our best framework obtains 1.75 to 1.95 times better correlation coefficients than other deep learning models that incorporate environmental data depending on the test scenario. Furthermore, the feature importance analysis shows that environmental information, followed by genomic information, is the driving factor in predicting environment-specific yield for a line. We also demonstrate a way to extend our framework for new data types, such as text or soil data. The extended model also shows the potential to be useful in genomic selection.

GxENet:基于全连接神经网络的小麦产量预测方法
作物品系数量性状的表达取决于其遗传、播种环境以及遗传信息与GxE环境之间的相互作用。因此,为了最大限度地提高粮食产量,通过选择适合特定环境的优良种子系来开发新品种。基因组选择是一种开发新品种的计算技术,该技术使用全基因组分子标记来识别作物的顶线。大量的统计和机器学习模型被用于单一环境试验,其中假设环境对数量性状没有任何影响。然而,开发新品种必须同时考虑基因组和环境数据,因为这些强有力的假设可能会导致无法选择环境的顶线。在这里,我们设计了三种新的深度学习框架,将GxE纳入深度学习模型中,并预测了环境的特定行产量。在这个过程中,我们还开发了一种识别环境特异性标记的新技术,该技术可用于环境特异性基因组选择的许多应用。结果表明,我们的最佳框架获得的相关系数是其他深度学习模型的1.75至1.95倍,这些模型根据测试场景结合了环境数据。此外,特征重要性分析表明,环境信息和基因组信息是预测品系环境特异性产量的驱动因素。我们还展示了一种将我们的框架扩展到新数据类型的方法,例如文本或土壤数据。扩展模型也显示了在基因组选择中有用的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence in Agriculture
Artificial Intelligence in Agriculture Engineering-Engineering (miscellaneous)
CiteScore
21.60
自引率
0.00%
发文量
18
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信