Regularized regression can improve estimates of multivariate selection in the face of multicollinearity and limited data

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Jacqueline L. Sztepanacz, David Houle
{"title":"Regularized regression can improve estimates of multivariate selection in the face of multicollinearity and limited data","authors":"Jacqueline L. Sztepanacz, David Houle","doi":"10.1093/evlett/qrad064","DOIUrl":null,"url":null,"abstract":"\n The breeder’s equation, Δz¯=Gβ , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β , posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/evlett/qrad064","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The breeder’s equation, Δz¯=Gβ , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β , posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.
面对多重共线性和有限数据,正则化回归可改进多变量选择的估计值
育种方程 Δz¯=Gβ 使我们能够了解遗传学(遗传协方差矩阵 G)和线性选择梯度矢量 β 是如何相互作用产生进化轨迹的。利用性状值对相对适合度的多元回归来估计β,彻底改变了我们研究实验室和野生种群选择的方法。然而,多重共线性或预测因子的相关性会导致 β 的方差和各元素之间的协方差非常大,给参数估计的解释带来了挑战。这在大数据时代尤为重要,因为预测因子的数量可能接近或超过观测值的数量。处理多重共线性预测因子的常见方法是舍弃其中一些,从而失去从这些特征中可能获得的任何信息。通过模拟,我们一方面展示了多重共线性如何导致选择的估计值不准确,另一方面也展示了从分析中剔除相关表型如何导致对选择目标的误解。我们的研究表明,正则化回归对 β 的单个元素的大小施加了经过数据验证的限制,在存在多重共线性和数据有限的情况下,正则化回归可以对多元选择的总强度和方向做出更准确的估计,而且在多重共线性较低的情况下,正则化回归的成本往往很低。我们还在对已发表的三个案例研究的重新分析中,比较了标准回归和正则化回归对选择的估计,结果表明正则化回归可以改善独立数据中的适配性预测。我们的研究结果表明,正则化回归是一种有价值的工具,可以作为传统最小二乘选择估计值的重要补充。在某些情况下,使用正则化回归可以提高对个体适应性的预测,并改进对多元选择的总强度和方向的估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信