猪数量性状基因组预测的前馈神经网络模型的基准测试。

IF 2.8 3区生物学 Q2 GENETICS & HEREDITY

Frontiers in Genetics Pub Date : 2025-06-18 eCollection Date: 2025-01-01 DOI:10.3389/fgene.2025.1618891

Junjian Wang, Francesco Tiezzi, Yijian Huang, Christian Maltecca, Jicai Jiang

{"title":"猪数量性状基因组预测的前馈神经网络模型的基准测试。","authors":"Junjian Wang, Francesco Tiezzi, Yijian Huang, Christian Maltecca, Jicai Jiang","doi":"10.3389/fgene.2025.1618891","DOIUrl":null,"url":null,"abstract":"Artificial neural networks are machine learning models that have been applied to various genomic problems, with the ability to learn non-linear relationships and model high-dimensional data. These advanced modeling capabilities make them promising candidates for genomic prediction by potentially capturing the intricate relationships between genetic variants and phenotypes. Despite these theoretical advantages, neural networks have shown inconsistent performance across previous genomic prediction research, and limited studies have evaluated their performance and feasibility specifically for pig genomic predictions using large-scale data. We evaluated the predictive performance of feed-forward neural network (FFNN) models implemented in TensorFlow with architectures ranging from single-layer (no hidden layers) to four-layer structures (three hidden layers). These FFNN models were compared with five linear methods, including GBLUP, LDAK-BOLT, BayesR, SLEMM-WW, and scikit-learn's ridge regression. The evaluation utilized data from six quantitative traits: off-test body weight (WT), off-test back fat thickness (BF), off-test loin muscle depth (MS), number of piglets born alive (NBA), number of piglets born dead (NBD), and number of piglets weaned (NW). We also assessed the computational efficiency of FFNN models on both CPU and GPU. The benchmarking employed repeated random subsampling validation with sample sizes ranging from 3,290 individuals for reproductive traits to over 26,000 individuals for production traits, using data from a total of 27,481 genotyped pigs. Hyperband tuning was used to optimize the hyper-parameters and select the best model for each structure. Results showed that FFNN models consistently underperformed compared to linear methods across all architectures tested. The one-layer structure yielded the best predictive accuracy among the FFNN approaches. Of the five linear methods, SLEMM-WW demonstrated the best balance of computational efficiency and predictive ability. GPUs offered significant computational efficiency gains for multi-layer FFNN models compared to CPUs, though FFNN models remained more computationally demanding than most linear methods. In conclusion, FFNN models with up to four layers did not improve genomic predictions compared to routine linear methods for pig quantitative traits.","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":"16 ","pages":"1618891"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12213717/pdf/","citationCount":"0","resultStr":"{\"title\":\"Benchmarking of feed-forward neural network models for genomic prediction of quantitative traits in pigs.\",\"authors\":\"Junjian Wang, Francesco Tiezzi, Yijian Huang, Christian Maltecca, Jicai Jiang\",\"doi\":\"10.3389/fgene.2025.1618891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial neural networks are machine learning models that have been applied to various genomic problems, with the ability to learn non-linear relationships and model high-dimensional data. These advanced modeling capabilities make them promising candidates for genomic prediction by potentially capturing the intricate relationships between genetic variants and phenotypes. Despite these theoretical advantages, neural networks have shown inconsistent performance across previous genomic prediction research, and limited studies have evaluated their performance and feasibility specifically for pig genomic predictions using large-scale data. We evaluated the predictive performance of feed-forward neural network (FFNN) models implemented in TensorFlow with architectures ranging from single-layer (no hidden layers) to four-layer structures (three hidden layers). These FFNN models were compared with five linear methods, including GBLUP, LDAK-BOLT, BayesR, SLEMM-WW, and scikit-learn's ridge regression. The evaluation utilized data from six quantitative traits: off-test body weight (WT), off-test back fat thickness (BF), off-test loin muscle depth (MS), number of piglets born alive (NBA), number of piglets born dead (NBD), and number of piglets weaned (NW). We also assessed the computational efficiency of FFNN models on both CPU and GPU. The benchmarking employed repeated random subsampling validation with sample sizes ranging from 3,290 individuals for reproductive traits to over 26,000 individuals for production traits, using data from a total of 27,481 genotyped pigs. Hyperband tuning was used to optimize the hyper-parameters and select the best model for each structure. Results showed that FFNN models consistently underperformed compared to linear methods across all architectures tested. The one-layer structure yielded the best predictive accuracy among the FFNN approaches. Of the five linear methods, SLEMM-WW demonstrated the best balance of computational efficiency and predictive ability. GPUs offered significant computational efficiency gains for multi-layer FFNN models compared to CPUs, though FFNN models remained more computationally demanding than most linear methods. In conclusion, FFNN models with up to four layers did not improve genomic predictions compared to routine linear methods for pig quantitative traits.\",\"PeriodicalId\":12750,\"journal\":{\"name\":\"Frontiers in Genetics\",\"volume\":\"16 \",\"pages\":\"1618891\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12213717/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3389/fgene.2025.1618891\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2025.1618891","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

人工神经网络是一种机器学习模型，已应用于各种基因组问题，具有学习非线性关系和模拟高维数据的能力。这些先进的建模能力通过潜在地捕获遗传变异和表型之间的复杂关系，使它们成为基因组预测的有希望的候选者。尽管有这些理论上的优势，神经网络在之前的基因组预测研究中表现不一致，并且有限的研究已经评估了它们的性能和可行性，特别是使用大规模数据进行猪基因组预测。我们评估了在TensorFlow中实现的前馈神经网络（FFNN）模型的预测性能，其架构范围从单层（无隐藏层）到四层结构（三个隐藏层）。将这些FFNN模型与GBLUP、LDAK-BOLT、BayesR、SLEMM-WW和scikit-learn的脊回归等5种线性方法进行比较。评价采用了6个数量性状的数据：非试验体重（WT）、非试验背膘厚（BF）、非试验腰肌深度（MS）、活产仔猪数（NBA）、死产仔猪数（NBD）和断奶仔猪数（NW）。我们还评估了FFNN模型在CPU和GPU上的计算效率。基准测试采用重复随机亚抽样验证，样本量从生殖性状的3290个个体到生产性状的26000多个个体，使用来自27481头基因型猪的数据。采用超带调谐方法对各结构的超参数进行优化，选择最优模型。结果表明，与所有测试的架构中的线性方法相比，FFNN模型始终表现不佳。在所有FFNN方法中，单层结构的预测精度最好。在五种线性方法中，SLEMM-WW在计算效率和预测能力方面表现出最好的平衡。与cpu相比，gpu为多层FFNN模型提供了显著的计算效率提升，尽管FFNN模型仍然比大多数线性方法需要更多的计算量。综上所述，与常规的猪数量性状线性方法相比，四层的FFNN模型并没有改善基因组预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Benchmarking of feed-forward neural network models for genomic prediction of quantitative traits in pigs.

Artificial neural networks are machine learning models that have been applied to various genomic problems, with the ability to learn non-linear relationships and model high-dimensional data. These advanced modeling capabilities make them promising candidates for genomic prediction by potentially capturing the intricate relationships between genetic variants and phenotypes. Despite these theoretical advantages, neural networks have shown inconsistent performance across previous genomic prediction research, and limited studies have evaluated their performance and feasibility specifically for pig genomic predictions using large-scale data. We evaluated the predictive performance of feed-forward neural network (FFNN) models implemented in TensorFlow with architectures ranging from single-layer (no hidden layers) to four-layer structures (three hidden layers). These FFNN models were compared with five linear methods, including GBLUP, LDAK-BOLT, BayesR, SLEMM-WW, and scikit-learn's ridge regression. The evaluation utilized data from six quantitative traits: off-test body weight (WT), off-test back fat thickness (BF), off-test loin muscle depth (MS), number of piglets born alive (NBA), number of piglets born dead (NBD), and number of piglets weaned (NW). We also assessed the computational efficiency of FFNN models on both CPU and GPU. The benchmarking employed repeated random subsampling validation with sample sizes ranging from 3,290 individuals for reproductive traits to over 26,000 individuals for production traits, using data from a total of 27,481 genotyped pigs. Hyperband tuning was used to optimize the hyper-parameters and select the best model for each structure. Results showed that FFNN models consistently underperformed compared to linear methods across all architectures tested. The one-layer structure yielded the best predictive accuracy among the FFNN approaches. Of the five linear methods, SLEMM-WW demonstrated the best balance of computational efficiency and predictive ability. GPUs offered significant computational efficiency gains for multi-layer FFNN models compared to CPUs, though FFNN models remained more computationally demanding than most linear methods. In conclusion, FFNN models with up to four layers did not improve genomic predictions compared to routine linear methods for pig quantitative traits.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Genetics Biochemistry, Genetics and Molecular Biology-Molecular Medicine

CiteScore

5.50

自引率

8.10%

发文量

3491

审稿时长

14 weeks

期刊介绍： Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public. The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.