从高维噪声数据中学习低维非线性结构:一种积分算子方法

IF 3.2 1区 数学 Q1 STATISTICS & PROBABILITY
Xiucai Ding, Rong Ma
{"title":"从高维噪声数据中学习低维非线性结构:一种积分算子方法","authors":"Xiucai Ding, Rong Ma","doi":"10.1214/23-aos2306","DOIUrl":null,"url":null,"abstract":"We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from noisy and high-dimensional observations, where the data sets are assumed to be sampled from a nonlinear manifold model and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, for a general class of kernel functions, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension grows polynomially with the size, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove the convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Our results hold even when the dimension of the manifold grows with the sample size. Numerical simulations and analysis of real data sets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various nonlinear manifolds in diverse applications.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach\",\"authors\":\"Xiucai Ding, Rong Ma\",\"doi\":\"10.1214/23-aos2306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from noisy and high-dimensional observations, where the data sets are assumed to be sampled from a nonlinear manifold model and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, for a general class of kernel functions, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension grows polynomially with the size, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove the convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Our results hold even when the dimension of the manifold grows with the sample size. Numerical simulations and analysis of real data sets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various nonlinear manifolds in diverse applications.\",\"PeriodicalId\":8032,\"journal\":{\"name\":\"Annals of Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/23-aos2306\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/23-aos2306","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了一种核谱嵌入算法,用于从噪声和高维观测中学习低维非线性结构,其中数据集被假设从非线性流形模型中采样并被高维噪声破坏。该算法采用了一种不依赖于先验知识的自适应带宽选择方法。得到的低维嵌入可以进一步用于下游目的,如数据可视化、聚类和预测。我们的方法在理论上是合理的,在实践上是可以解释的。具体来说,对于一类一般的核函数,我们建立了当维数随大小多项式增长时,最终嵌入到其无噪声对应物的收敛性,并表征了信噪比对收敛速度和相变的影响。我们也证明了嵌入到一个积分算子的特征函数的收敛性,这个积分算子是由捕获底层非线性结构的再现核希尔伯特空间的核映射所定义的。即使流形的尺寸随着样本量的增加而增加,我们的结果仍然成立。数值模拟和实际数据集分析表明,与许多现有方法相比,该方法在学习各种应用中的各种非线性流形方面具有优越的经验性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from noisy and high-dimensional observations, where the data sets are assumed to be sampled from a nonlinear manifold model and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, for a general class of kernel functions, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension grows polynomially with the size, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove the convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Our results hold even when the dimension of the manifold grows with the sample size. Numerical simulations and analysis of real data sets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various nonlinear manifolds in diverse applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annals of Statistics
Annals of Statistics 数学-统计学与概率论
CiteScore
9.30
自引率
8.90%
发文量
119
审稿时长
6-12 weeks
期刊介绍: The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信