Effects of using deep learning to predict the geographic origin of barley genebank accessions on genome-environment association studies.

IF 4.2 1区 农林科学 Q1 AGRONOMY
Che-Wei Chang, Karl Schmid
{"title":"Effects of using deep learning to predict the geographic origin of barley genebank accessions on genome-environment association studies.","authors":"Che-Wei Chang, Karl Schmid","doi":"10.1007/s00122-025-05003-w","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>11</mn> <mo>,</mo> <mn>032</mn></mrow> </math> ) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>626</mn></mrow> </math> ), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.</p>","PeriodicalId":22955,"journal":{"name":"Theoretical and Applied Genetics","volume":"138 9","pages":"211"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343745/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical and Applied Genetics","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s00122-025-05003-w","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0

Abstract

Genome-environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( N = 11 , 032 ) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( N = 1 , 626 ), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.

利用深度学习预测大麦基因库地理来源对基因组-环境关联研究的影响。
基因组-环境关联(GEA)是一种将遗传变异与环境参数相结合来鉴定适应性位点的方法,为提高作物的抗逆性提供了可能。然而,地理来源数据的缺失限制了其在基因库中的应用。为了解决这一限制,我们探索了使用神经网络来预测大麦资源的地理来源,并将输入的环境数据整合到GEA中。神经网络在交叉验证中表现出很高的准确性,但偶尔会产生生态上不可信的预测,因为模型只考虑地理邻近性。例如,一些预测的起源位于非耕地地区,如地中海。以大麦开花时间基因为基准,与常规GEA (N = 1,626)相比,整合了输入环境数据(N = 11,032)的GEA在开花时间基因附近的基因组区域显示出部分一致但互补的检测结果,这突显了使用输入数据的GEA在发现新的适应性位点方面对常规GEA的补充潜力。此外,与我们最初的假设相反,我们预计通过增加样本量可以显著提高GEA性能,我们的模拟产生了意想不到的见解。我们的研究表明,GEA方法的敏感性可能存在局限性,因为通过预测缺失的地理数据实现了样本量的大幅扩大。总的来说,我们的研究通过将深度学习与GEA相结合,为利用不完整的地理来源数据提供了见解。我们的研究结果表明,需要进一步发展GEA方法来优化输入环境数据的使用,例如纳入区域GEA模式,而不是仅仅关注等位基因频率与大尺度景观环境梯度之间的全球关联。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.60
自引率
7.40%
发文量
241
审稿时长
2.3 months
期刊介绍: Theoretical and Applied Genetics publishes original research and review articles in all key areas of modern plant genetics, plant genomics and plant biotechnology. All work needs to have a clear genetic component and significant impact on plant breeding. Theoretical considerations are only accepted in combination with new experimental data and/or if they indicate a relevant application in plant genetics or breeding. Emphasizing the practical, the journal focuses on research into leading crop plants and articles presenting innovative approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信