Linear covariance selection model via ℓ1-penalization

IF 1.6 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis Pub Date : 2025-03-27 DOI:10.1016/j.csda.2025.108176

Kwan-Young Bak , Seongoh Park

{"title":"Linear covariance selection model via ℓ1-penalization","authors":"Kwan-Young Bak , Seongoh Park","doi":"10.1016/j.csda.2025.108176","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a study on an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-penalized covariance regression method. Conventional approaches in high-dimensional covariance estimation often lack the flexibility to integrate external information. As a remedy, we adopt the regression-based covariance modeling framework and introduce a linear covariance selection model (LCSM) to encompass a broader spectrum of covariance structures when covariate information is available. Unlike existing methods, we do not assume that the true covariance matrix can be exactly represented by a linear combination of known basis matrices. Instead, we adopt additional basis matrices for a portion of the covariance patterns not captured by the given bases. To estimate high-dimensional regression coefficients, we exploit the sparsity-inducing <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-penalization scheme. Our theoretical analyses are based on the (symmetric) matrix regression model with additive random error matrix, which allows us to establish new non-asymptotic convergence rates of the proposed covariance estimator. The proposed method is implemented with the coordinate descent algorithm. We conduct empirical evaluation on simulated data to complement theoretical findings and underscore the efficacy of our approach. To show a practical applicability of our method, we further apply it to the co-expression analysis of liver gene expression data where the given basis corresponds to the adjacency matrix of the co-expression network.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108176"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325000520","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents a study on an

ℓ_{1}

-penalized covariance regression method. Conventional approaches in high-dimensional covariance estimation often lack the flexibility to integrate external information. As a remedy, we adopt the regression-based covariance modeling framework and introduce a linear covariance selection model (LCSM) to encompass a broader spectrum of covariance structures when covariate information is available. Unlike existing methods, we do not assume that the true covariance matrix can be exactly represented by a linear combination of known basis matrices. Instead, we adopt additional basis matrices for a portion of the covariance patterns not captured by the given bases. To estimate high-dimensional regression coefficients, we exploit the sparsity-inducing

ℓ_{1}

-penalization scheme. Our theoretical analyses are based on the (symmetric) matrix regression model with additive random error matrix, which allows us to establish new non-asymptotic convergence rates of the proposed covariance estimator. The proposed method is implemented with the coordinate descent algorithm. We conduct empirical evaluation on simulated data to complement theoretical findings and underscore the efficacy of our approach. To show a practical applicability of our method, we further apply it to the co-expression analysis of liver gene expression data where the given basis corresponds to the adjacency matrix of the co-expression network.

查看原文本刊更多论文

基于_1惩罚的线性协方差选择模型

本文研究了一种l1惩罚的协方差回归方法。传统的高维协方差估计方法往往缺乏集成外部信息的灵活性。作为补救措施，我们采用基于回归的协方差建模框架，并引入线性协方差选择模型（LCSM），以在协方差信息可用时涵盖更广泛的协方差结构。与现有方法不同，我们不假设真正的协方差矩阵可以由已知基矩阵的线性组合精确表示。相反，我们采用额外的基矩阵来处理未被给定基捕获的部分协方差模式。为了估计高维回归系数，我们利用稀疏性诱导的1-惩罚方案。我们的理论分析是基于具有加性随机误差矩阵的（对称）矩阵回归模型，这使我们能够建立新的协方差估计的非渐近收敛率。该方法采用坐标下降算法实现。我们对模拟数据进行实证评估，以补充理论发现，并强调我们方法的有效性。为了显示我们的方法的实际适用性，我们进一步将其应用于肝脏基因表达数据的共表达分析，其中给定的基对应于共表达网络的邻接矩阵。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Statistics & Data Analysis 数学-计算机：跨学科应用

CiteScore

3.70

自引率

5.60%

发文量

167

审稿时长

60 days

期刊介绍： Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]