{"title":"确定贝叶斯LASSO正则化参数的不同方法对基因组预测精度的影响。","authors":"Hamid Sahebalam, Mohsen Gholizadeh, Seyed Hassan Hafezian","doi":"10.1007/s00335-024-10088-7","DOIUrl":null,"url":null,"abstract":"<p><p>Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the <math><mrow><mspace></mspace> <mi>L</mi> <mn>1</mn></mrow> </math> norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.</p>","PeriodicalId":18259,"journal":{"name":"Mammalian Genome","volume":" ","pages":"331-345"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.\",\"authors\":\"Hamid Sahebalam, Mohsen Gholizadeh, Seyed Hassan Hafezian\",\"doi\":\"10.1007/s00335-024-10088-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the <math><mrow><mspace></mspace> <mi>L</mi> <mn>1</mn></mrow> </math> norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.</p>\",\"PeriodicalId\":18259,\"journal\":{\"name\":\"Mammalian Genome\",\"volume\":\" \",\"pages\":\"331-345\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mammalian Genome\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00335-024-10088-7\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mammalian Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00335-024-10088-7","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/11 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.
Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.
期刊介绍:
Mammalian Genome focuses on the experimental, theoretical and technical aspects of genetics, genomics, epigenetics and systems biology in mouse, human and other mammalian species, with an emphasis on the relationship between genotype and phenotype, elucidation of biological and disease pathways as well as experimental aspects of interventions, therapeutics, and precision medicine. The journal aims to publish high quality original papers that present novel findings in all areas of mammalian genetic research as well as review articles on areas of topical interest. The journal will also feature commentaries and editorials to inform readers of breakthrough discoveries as well as issues of research standards, policies and ethics.