{"title":"利用深度学习预测大麦基因库地理来源对基因组-环境关联研究的影响。","authors":"Che-Wei Chang, Karl Schmid","doi":"10.1007/s00122-025-05003-w","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>11</mn> <mo>,</mo> <mn>032</mn></mrow> </math> ) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>626</mn></mrow> </math> ), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.</p>","PeriodicalId":22955,"journal":{"name":"Theoretical and Applied Genetics","volume":"138 9","pages":"211"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343745/pdf/","citationCount":"0","resultStr":"{\"title\":\"Effects of using deep learning to predict the geographic origin of barley genebank accessions on genome-environment association studies.\",\"authors\":\"Che-Wei Chang, Karl Schmid\",\"doi\":\"10.1007/s00122-025-05003-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Genome-environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>11</mn> <mo>,</mo> <mn>032</mn></mrow> </math> ) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>626</mn></mrow> </math> ), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.</p>\",\"PeriodicalId\":22955,\"journal\":{\"name\":\"Theoretical and Applied Genetics\",\"volume\":\"138 9\",\"pages\":\"211\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343745/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theoretical and Applied Genetics\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1007/s00122-025-05003-w\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRONOMY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical and Applied Genetics","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s00122-025-05003-w","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
Effects of using deep learning to predict the geographic origin of barley genebank accessions on genome-environment association studies.
Genome-environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( ) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( ), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.
期刊介绍:
Theoretical and Applied Genetics publishes original research and review articles in all key areas of modern plant genetics, plant genomics and plant biotechnology. All work needs to have a clear genetic component and significant impact on plant breeding. Theoretical considerations are only accepted in combination with new experimental data and/or if they indicate a relevant application in plant genetics or breeding. Emphasizing the practical, the journal focuses on research into leading crop plants and articles presenting innovative approaches.