数据科学的光谱方法:统计学视角

Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma
{"title":"数据科学的光谱方法:统计学视角","authors":"Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma","doi":"10.1561/2200000079","DOIUrl":null,"url":null,"abstract":"Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. \nWhile the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\\ell_2$ perturbation analysis, we present a systematic $\\ell_{\\infty}$ and $\\ell_{2,\\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful \"leave-one-out\" analysis framework.","PeriodicalId":431372,"journal":{"name":"Found. Trends Mach. Learn.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":"{\"title\":\"Spectral Methods for Data Science: A Statistical Perspective\",\"authors\":\"Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma\",\"doi\":\"10.1561/2200000079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. \\nWhile the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\\\\ell_2$ perturbation analysis, we present a systematic $\\\\ell_{\\\\infty}$ and $\\\\ell_{2,\\\\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful \\\"leave-one-out\\\" analysis framework.\",\"PeriodicalId\":431372,\"journal\":{\"name\":\"Found. Trends Mach. Learn.\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"111\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Found. Trends Mach. Learn.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1561/2200000079\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Found. Trends Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1561/2200000079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 111

摘要

光谱方法已经成为一种简单但令人惊讶的有效方法,用于从大量,嘈杂和不完整的数据中提取信息。简而言之,谱方法是指建立在特征值基础上的算法集合。奇异值)和特征向量(resp。由数据构造的适当设计的矩阵的奇异向量。在机器学习、数据科学和信号处理中发现了各种各样的应用。由于其简单和有效,谱方法不仅用作独立的估计器,而且经常用于初始化其他更复杂的算法以提高性能。虽然光谱方法的研究可以追溯到经典的矩阵摄动理论和矩量方法,但在过去的十年中,借助非渐近随机矩阵理论,通过统计建模的视角,在揭开其功效的神秘面纱方面取得了巨大的理论进展。本专著旨在从现代统计角度介绍光谱方法的系统,全面,但可访问的介绍,突出其在各种大规模应用中的算法含义。特别是,我们的阐述围绕着几个跨越各种应用的核心问题:如何表征光谱方法在达到统计精度目标水平时的样本效率,以及如何在面对随机噪声、缺失数据和对抗性腐蚀时评估它们的稳定性?除了传统的$\ell_2$摄动分析,我们提出了一个系统的$\ell_{\infty}$和$\ell_{2,\infty}$摄动理论的特征空间和奇异子空间,这是最近才成为可用的,由于一个强大的“留一个”的分析框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spectral Methods for Data Science: A Statistical Perspective
Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\ell_2$ perturbation analysis, we present a systematic $\ell_{\infty}$ and $\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信