确定贝叶斯LASSO正则化参数的不同方法对基因组预测精度的影响。

IF 2.7 4区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Mammalian Genome Pub Date : 2025-03-01 Epub Date: 2024-12-11 DOI:10.1007/s00335-024-10088-7
Hamid Sahebalam, Mohsen Gholizadeh, Seyed Hassan Hafezian
{"title":"确定贝叶斯LASSO正则化参数的不同方法对基因组预测精度的影响。","authors":"Hamid Sahebalam, Mohsen Gholizadeh, Seyed Hassan Hafezian","doi":"10.1007/s00335-024-10088-7","DOIUrl":null,"url":null,"abstract":"<p><p>Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the <math><mrow><mspace></mspace> <mi>L</mi> <mn>1</mn></mrow> </math> norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.</p>","PeriodicalId":18259,"journal":{"name":"Mammalian Genome","volume":" ","pages":"331-345"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.\",\"authors\":\"Hamid Sahebalam, Mohsen Gholizadeh, Seyed Hassan Hafezian\",\"doi\":\"10.1007/s00335-024-10088-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the <math><mrow><mspace></mspace> <mi>L</mi> <mn>1</mn></mrow> </math> norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.</p>\",\"PeriodicalId\":18259,\"journal\":{\"name\":\"Mammalian Genome\",\"volume\":\" \",\"pages\":\"331-345\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mammalian Genome\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00335-024-10088-7\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mammalian Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00335-024-10088-7","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/11 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

使用密集基因组标记为育种计划带来了新的机遇和挑战。当密集标记可用时,惩罚特定标记的回归系数的需要变得特别重要。因此,使用正则化技术,如贝叶斯LASSO (BL)回归,拟合观察到的标记效应是非常有趣的。当将拉普拉斯先验分布应用于回归系数时,BL可以解释为基于贝叶斯方法的L - 1范数的正则化。一个关键问题是在正则化技术的先验分布中适当选择超参数值,因为这些值基本上控制了估计模型的稀疏性。本研究的目的是评估基于全贝叶斯方法(如gamma先验(BL_Gamma), beta先验(BL_Beta)和固定先验(BL_Fixed))以及数据驱动方法(如基于均方误差(BL_CV_MSE)和预测精度(BL_CV_PA)的交叉验证)选择BL正则化参数的不同方法。此外,还探索了基于信息准则的赤池信息准则(BL_AIC)、贝叶斯信息准则(BL_BIC)和偏差信息准则(BL_DIC)。为此,模拟了包含8条染色体(每条长度为1 Morgan)和100个随机分布的数量性状位点的基因组。研究情景1为4000个标记,遗传率为0.2;情景2为4000个标记,遗传率为0.6;情景3为1.6万个标记,遗传率为0.2;场景4涉及16000个标记,遗传率为0.6。结果表明,在完全贝叶斯和交叉验证方法中,BL_Gamma、BL_Beta和BL_CV_MSE在场景1和场景3的预测精度最高。随着标记密度和遗传力的增加(场景4),交叉验证方法的表现略好。基于信息标准的方法显示最低的PA。遗传力和标记密度的增加分别导致回归系数的模型惩罚减小和增加。在情景1、情景2、情景3和情景4中,目标人群的PA分别为0.210至0.413、0.402至0.600、0.256至0.442和0.478至0.653。一般来说,基于正则化参数随机先验的全贝叶斯方法被推荐用于BL,因为它们提供了可接受的PA和更低的计算负荷。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.

Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the L 1 norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, as these values essentially control the sparsity in the estimated model. The purpose of this study was to evaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1 involved 4000 markers and heritability of 0.2, scenario 2 involved 4000 markers and heritability of 0.6, scenario 3 involved 16,000 markers and heritability of 0.2; and scenario 4 involved 16,000 markers and heritability of 0.6. The results showed that among the fully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA. Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413 in Scenario 1, 0.402 to 0.600 in Scenario 2, 0.256 to 0.442 in Scenario 3, and 0.478 to 0.653 in Scenario 4. In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Mammalian Genome
Mammalian Genome 生物-生化与分子生物学
CiteScore
4.00
自引率
0.00%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Mammalian Genome focuses on the experimental, theoretical and technical aspects of genetics, genomics, epigenetics and systems biology in mouse, human and other mammalian species, with an emphasis on the relationship between genotype and phenotype, elucidation of biological and disease pathways as well as experimental aspects of interventions, therapeutics, and precision medicine. The journal aims to publish high quality original papers that present novel findings in all areas of mammalian genetic research as well as review articles on areas of topical interest. The journal will also feature commentaries and editorials to inform readers of breakthrough discoveries as well as issues of research standards, policies and ethics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信