Scalable network estimation with L ₀ penalty.

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Statistical Analysis and Data Mining Pub Date : 2021-02-01 Epub Date: 2020-10-21 DOI:10.1002/sam.11483

Junghi Kim, Hongtu Zhu, Xiao Wang, Kim-Anh Do

{"title":"Scalable network estimation with L 0 penalty.","authors":"Junghi Kim, Hongtu Zhu, Xiao Wang, Kim-Anh Do","doi":"10.1002/sam.11483","DOIUrl":null,"url":null,"abstract":"With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L 0 penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"14 1","pages":"18-30"},"PeriodicalIF":3.6000,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11483","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11483","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/10/21 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L ₀ penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.

查看原文本刊更多论文

具有l0损失的可扩展网络估计。

随着高通量测序的出现，需要一种高效的计算策略来处理大量的基因组数据集。估计一个大精度矩阵的挑战已经获得了大量的研究关注，因为它直接应用于判别分析和图形模型。大多数现有的方法要么使用可能导致有偏估计的套索类型惩罚，要么是计算密集型的，这阻碍了它们在非常大的图中的应用。我们建议使用l0惩罚来估计超大精度矩阵(scalnetL0)。我们将scalnetL0应用于癌症基因组图谱中乳腺癌患者的RNA-seq数据，发现生存时间分类的准确性得到了提高。估计的精度矩阵提供了关于乳腺癌中大规模共表达网络的信息。仿真研究表明，scalnetL0提供了更准确和高效的估计器，在大规模精确矩阵估计的稀疏学习中产生更短的CPU时间和更少的Frobenius损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.