面对多重共线性和有限数据，正则化回归可改进多变量选择的估计值

IF 4.7 3区材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

ACS Applied Electronic Materials Pub Date : 2024-01-23 DOI:10.1093/evlett/qrad064

Jacqueline L. Sztepanacz, David Houle

{"title":"面对多重共线性和有限数据，正则化回归可改进多变量选择的估计值","authors":"Jacqueline L. Sztepanacz, David Houle","doi":"10.1093/evlett/qrad064","DOIUrl":null,"url":null,"abstract":"\n The breeder’s equation, Δz¯=Gβ , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β , posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":"130 4","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Regularized regression can improve estimates of multivariate selection in the face of multicollinearity and limited data\",\"authors\":\"Jacqueline L. Sztepanacz, David Houle\",\"doi\":\"10.1093/evlett/qrad064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The breeder’s equation, Δz¯=Gβ , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β , posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":\"130 4\",\"pages\":\"\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/evlett/qrad064\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/evlett/qrad064","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

育种方程 Δz¯=Gβ 使我们能够了解遗传学（遗传协方差矩阵 G）和线性选择梯度矢量 β 是如何相互作用产生进化轨迹的。利用性状值对相对适合度的多元回归来估计β，彻底改变了我们研究实验室和野生种群选择的方法。然而，多重共线性或预测因子的相关性会导致 β 的方差和各元素之间的协方差非常大，给参数估计的解释带来了挑战。这在大数据时代尤为重要，因为预测因子的数量可能接近或超过观测值的数量。处理多重共线性预测因子的常见方法是舍弃其中一些，从而失去从这些特征中可能获得的任何信息。通过模拟，我们一方面展示了多重共线性如何导致选择的估计值不准确，另一方面也展示了从分析中剔除相关表型如何导致对选择目标的误解。我们的研究表明，正则化回归对 β 的单个元素的大小施加了经过数据验证的限制，在存在多重共线性和数据有限的情况下，正则化回归可以对多元选择的总强度和方向做出更准确的估计，而且在多重共线性较低的情况下，正则化回归的成本往往很低。我们还在对已发表的三个案例研究的重新分析中，比较了标准回归和正则化回归对选择的估计，结果表明正则化回归可以改善独立数据中的适配性预测。我们的研究结果表明，正则化回归是一种有价值的工具，可以作为传统最小二乘选择估计值的重要补充。在某些情况下，使用正则化回归可以提高对个体适应性的预测，并改进对多元选择的总强度和方向的估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Regularized regression can improve estimates of multivariate selection in the face of multicollinearity and limited data

The breeder’s equation, Δz¯=Gβ , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β , posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Applied Electronic Materials Multiple-

CiteScore

7.20

自引率

4.30%

发文量

567

期刊介绍： ACS Applied Electronic Materials is an interdisciplinary journal publishing original research covering all aspects of electronic materials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials science, engineering, optics, physics, and chemistry into important applications of electronic materials. Sample research topics that span the journal's scope are inorganic, organic, ionic and polymeric materials with properties that include conducting, semiconducting, superconducting, insulating, dielectric, magnetic, optoelectronic, piezoelectric, ferroelectric and thermoelectric. Indexed/Abstracted： Web of Science SCIE Scopus CAS INSPEC Portico