{"title":"Leveraging independence in high-dimensional mixed linear regression.","authors":"Ning Wang, Kai Deng, Qing Mai, Xin Zhang","doi":"10.1093/biomtc/ujae103","DOIUrl":null,"url":null,"abstract":"<p><p>We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujae103","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.
在高维度混合线性回归中,预测因子的数量大大超过了样本量,我们要解决的难题是估计回归系数和选择相关预测因子。该领域的最新进展集中在将稀疏性诱导惩罚纳入期望最大化(EM)算法中,该算法旨在最大化给定预测因子的响应的条件可能性。然而,现有程序通常将预测因子视为固定的,或忽略其固有的可变性。在本文中,我们利用预测变量和混合物的潜在指示变量之间的独立性来提高计算效率,并在所有混合物成分中实现协同变量选择。我们确定了所提出的快速组惩罚 EM 估计器对真实回归参数的非渐近收敛率。我们通过大量的模拟和应用于癌症细胞系百科全书数据集来预测抗癌药物敏感性,从而证明了我们方法的有效性。
期刊介绍:
The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.