Scalable network estimation with L 0 penalty.

IF 2.1 4区 数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Statistical Analysis and Data Mining Pub Date : 2021-02-01 Epub Date: 2020-10-21 DOI:10.1002/sam.11483
Junghi Kim, Hongtu Zhu, Xiao Wang, Kim-Anh Do
{"title":"Scalable network estimation with <i>L</i> <sub>0</sub> penalty.","authors":"Junghi Kim,&nbsp;Hongtu Zhu,&nbsp;Xiao Wang,&nbsp;Kim-Anh Do","doi":"10.1002/sam.11483","DOIUrl":null,"url":null,"abstract":"<p><p>With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an <i>L</i> <sub>0</sub> penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11483","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11483","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/10/21 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1

Abstract

With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an L 0 penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.

具有l0损失的可扩展网络估计。
随着高通量测序的出现,需要一种高效的计算策略来处理大量的基因组数据集。估计一个大精度矩阵的挑战已经获得了大量的研究关注,因为它直接应用于判别分析和图形模型。大多数现有的方法要么使用可能导致有偏估计的套索类型惩罚,要么是计算密集型的,这阻碍了它们在非常大的图中的应用。我们建议使用l0惩罚来估计超大精度矩阵(scalnetL0)。我们将scalnetL0应用于癌症基因组图谱中乳腺癌患者的RNA-seq数据,发现生存时间分类的准确性得到了提高。估计的精度矩阵提供了关于乳腺癌中大规模共表达网络的信息。仿真研究表明,scalnetL0提供了更准确和高效的估计器,在大规模精确矩阵估计的稀疏学习中产生更短的CPU时间和更少的Frobenius损失。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Analysis and Data Mining
Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
3.20
自引率
7.70%
发文量
43
期刊介绍: Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信