{"title":"Integrating gene expression data via weighted multiple kernel ridge regression improved accuracy of genomic prediction","authors":"Xue Wang, Jingfang Si, Yachun Wang, Lingzhao Fang, Zhe Zhang, Yi Zhang","doi":"10.1186/s12711-025-00997-9","DOIUrl":null,"url":null,"abstract":"Gene expression profiles hold potentially valuable information for the prediction of breeding values and phenotypes. However, in practical breeding programs, most reference population individuals typically have only genomic data, lacking transcriptomic data. Predicting gene expression based on genetic markers and integrating the genetically predicted gene expression data into genomic prediction may offer a potential solution. This study extends kernel ridge regression (KRR) to weighted multiple kernel ridge regression (WMKRR), which integrates genomic data and transcriptomic data predicted from genetic markers through a multiple kernel learning (MKL) approach. We evaluated the predictive ability of WMKRR compared to traditional genomic best linear unbiased prediction (GBLUP) and a combined genomic and transcriptomic best linear unbiased prediction (GTBLUP) in both genotype feature selection and non-feature selection scenarios in two datasets: (i) 3305 simulated data based on the Cattle Genotype-Tissue Expression (CattleGTEx) dataset, (ii) 5515 real dairy cattle data. Our results show that WMKRR yielded higher predictive abilities than GBLUP And GTBLUP in both simulated And real dairy cattle data. For the simulated data based on CattleGTEx, WMKRR achieved an average improvement in predictive ability of 1.12% And 1.13% over GBLUP And GTBLUP, respectively, under the non-feature selection scenario, And 3.17% And 3.23%, respectively, under the feature selection scenario. For the real dairy cattle data, in cross-validation, WMKRR improved over GBLUP And GTBLUP by An average of 5.56% And 7.23%, respectively, without feature selection, And by 5.66% And 6.40%, respectively, with feature selection. In forward validation, WMKRR improved over GBLUP And GTBLUP by An average of 5.68% And 8.41%, respectively, without feature selection, And by 4.66% And 7.06%, respectively, with feature selection. Our result demonstrates that the WMKRR model, which integrates genomic and genetically predicted transcriptomic data, achieves better prediction performance compared to traditional genomic prediction models. This study showed the potential of enhanced genomic breeding application using omics data with no further omics sequencing cost.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"13 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-025-00997-9","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Gene expression profiles hold potentially valuable information for the prediction of breeding values and phenotypes. However, in practical breeding programs, most reference population individuals typically have only genomic data, lacking transcriptomic data. Predicting gene expression based on genetic markers and integrating the genetically predicted gene expression data into genomic prediction may offer a potential solution. This study extends kernel ridge regression (KRR) to weighted multiple kernel ridge regression (WMKRR), which integrates genomic data and transcriptomic data predicted from genetic markers through a multiple kernel learning (MKL) approach. We evaluated the predictive ability of WMKRR compared to traditional genomic best linear unbiased prediction (GBLUP) and a combined genomic and transcriptomic best linear unbiased prediction (GTBLUP) in both genotype feature selection and non-feature selection scenarios in two datasets: (i) 3305 simulated data based on the Cattle Genotype-Tissue Expression (CattleGTEx) dataset, (ii) 5515 real dairy cattle data. Our results show that WMKRR yielded higher predictive abilities than GBLUP And GTBLUP in both simulated And real dairy cattle data. For the simulated data based on CattleGTEx, WMKRR achieved an average improvement in predictive ability of 1.12% And 1.13% over GBLUP And GTBLUP, respectively, under the non-feature selection scenario, And 3.17% And 3.23%, respectively, under the feature selection scenario. For the real dairy cattle data, in cross-validation, WMKRR improved over GBLUP And GTBLUP by An average of 5.56% And 7.23%, respectively, without feature selection, And by 5.66% And 6.40%, respectively, with feature selection. In forward validation, WMKRR improved over GBLUP And GTBLUP by An average of 5.68% And 8.41%, respectively, without feature selection, And by 4.66% And 7.06%, respectively, with feature selection. Our result demonstrates that the WMKRR model, which integrates genomic and genetically predicted transcriptomic data, achieves better prediction performance compared to traditional genomic prediction models. This study showed the potential of enhanced genomic breeding application using omics data with no further omics sequencing cost.
期刊介绍:
Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.