{"title":"Comparing artificial and convolutional neural networks with traditional models for Genomic prediction in wheat.","authors":"Wei Zhao, Jie Sheng","doi":"10.1007/s11032-025-01598-6","DOIUrl":null,"url":null,"abstract":"<p><p>With the rapid development of sequencing technology, the application of genomic prediction has become more and more common in breeding schemes of livestocks and crops. Selecting an appropriate statistical model is of central importance to achieve high prediction accuracy. Recently, machine learning models have been expected to upgrade genomic prediction into a new era. However, the perspective still suffers from lack of evidence that machine learning models can generally outperform the traditional ones on empirical data sets. In this study, we compared two machine learning models based on artificial neural network (ANN) and convolutional neural network (CNN) with four traditional models, including genomic best linear unbiased prediction (GBLUP), Bayesian ridge regression (BRR), BayesA and BayesB, using three published data sets for grain yield in wheat. For each model, we considered two variants: modeling and ignoring the genotype-by-environment ([Formula: see text]) interaction. In the comparison, we considered two strategies of cross-validation: predicting genotypes that have not been evaluated in any environment (CV1) and predicting genotypes that have been tested in other environments (CV2). Our results showed that traditional Bayesian models (BayesA, BayesB, and BRR) outperformed GBLUP, ANN and CNN when considering [Formula: see text] interaction. The accuracies of ANN and CNN were higher than traditional models only in CV1 and when [Formula: see text] interaction was ignored. It was also found that the performance of the two machine learning models was significantly affected by the interaction between the CV strategy and the way of treating the [Formula: see text] interaction, while that of the four traditional models was only influenced by whether the [Formula: see text] interaction was considered or not. Thus, machine learning models can be a powerful complementary to the traditional ones and their superiority may depend on the prediction scenario. Among the two machine learning models, we observed that the accuracy of ANN was higher than CNN in most cases, indicating that it is still challenging to adapt complex machine learning models such as CNN to genomic prediction.</p>","PeriodicalId":18769,"journal":{"name":"Molecular Breeding","volume":"45 9","pages":"75"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423009/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Breeding","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s11032-025-01598-6","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of sequencing technology, the application of genomic prediction has become more and more common in breeding schemes of livestocks and crops. Selecting an appropriate statistical model is of central importance to achieve high prediction accuracy. Recently, machine learning models have been expected to upgrade genomic prediction into a new era. However, the perspective still suffers from lack of evidence that machine learning models can generally outperform the traditional ones on empirical data sets. In this study, we compared two machine learning models based on artificial neural network (ANN) and convolutional neural network (CNN) with four traditional models, including genomic best linear unbiased prediction (GBLUP), Bayesian ridge regression (BRR), BayesA and BayesB, using three published data sets for grain yield in wheat. For each model, we considered two variants: modeling and ignoring the genotype-by-environment ([Formula: see text]) interaction. In the comparison, we considered two strategies of cross-validation: predicting genotypes that have not been evaluated in any environment (CV1) and predicting genotypes that have been tested in other environments (CV2). Our results showed that traditional Bayesian models (BayesA, BayesB, and BRR) outperformed GBLUP, ANN and CNN when considering [Formula: see text] interaction. The accuracies of ANN and CNN were higher than traditional models only in CV1 and when [Formula: see text] interaction was ignored. It was also found that the performance of the two machine learning models was significantly affected by the interaction between the CV strategy and the way of treating the [Formula: see text] interaction, while that of the four traditional models was only influenced by whether the [Formula: see text] interaction was considered or not. Thus, machine learning models can be a powerful complementary to the traditional ones and their superiority may depend on the prediction scenario. Among the two machine learning models, we observed that the accuracy of ANN was higher than CNN in most cases, indicating that it is still challenging to adapt complex machine learning models such as CNN to genomic prediction.
随着测序技术的快速发展,基因组预测在畜禽和农作物育种方案中的应用越来越普遍。选择合适的统计模型是实现高预测精度的关键。最近,机器学习模型有望将基因组预测升级到一个新时代。然而,缺乏证据表明机器学习模型通常可以在经验数据集上优于传统模型,这一观点仍然受到影响。本研究将基于人工神经网络(ANN)和卷积神经网络(CNN)的两种机器学习模型与基因组最佳线性无偏预测(GBLUP)、贝叶斯岭回归(BRR)、BayesA和BayesB四种传统模型进行了比较,并使用了三组已发表的小麦产量数据集。对于每个模型,我们考虑了两种变体:建模和忽略基因型与环境(公式:见文本)的相互作用。在比较中,我们考虑了两种交叉验证策略:预测未在任何环境中评估的基因型(CV1)和预测已在其他环境中测试的基因型(CV2)。我们的研究结果表明,在考虑[公式:见文本]交互时,传统的贝叶斯模型(BayesA, BayesB和BRR)优于GBLUP, ANN和CNN。ANN和CNN的准确率仅在CV1和忽略[Formula: see text]交互作用时高于传统模型。研究还发现,两种机器学习模型的性能显著受到CV策略和处理[Formula: see text]交互方式的交互影响,而四种传统模型的性能仅受是否考虑[Formula: see text]交互的影响。因此,机器学习模型可以成为传统模型的强大补充,其优势可能取决于预测场景。在这两种机器学习模型中,我们观察到ANN的准确率在大多数情况下都高于CNN,这表明将CNN等复杂的机器学习模型应用于基因组预测仍然具有挑战性。
期刊介绍:
Molecular Breeding is an international journal publishing papers on applications of plant molecular biology, i.e., research most likely leading to practical applications. The practical applications might relate to the Developing as well as the industrialised World and have demonstrable benefits for the seed industry, farmers, processing industry, the environment and the consumer.
All papers published should contribute to the understanding and progress of modern plant breeding, encompassing the scientific disciplines of molecular biology, biochemistry, genetics, physiology, pathology, plant breeding, and ecology among others.
Molecular Breeding welcomes the following categories of papers: full papers, short communications, papers describing novel methods and review papers. All submission will be subject to peer review ensuring the highest possible scientific quality standards.
Molecular Breeding core areas:
Molecular Breeding will consider manuscripts describing contemporary methods of molecular genetics and genomic analysis, structural and functional genomics in crops, proteomics and metabolic profiling, abiotic stress and field evaluation of transgenic crops containing particular traits. Manuscripts on marker assisted breeding are also of major interest, in particular novel approaches and new results of marker assisted breeding, QTL cloning, integration of conventional and marker assisted breeding, and QTL studies in crop plants.