SIAM journal on mathematics of data science最新文献

筛选
英文 中文
Algorithmic Regularization in Model-Free Overparametrized Asymmetric Matrix Factorization 无模型过参数化非对称矩阵分解的算法正则化
SIAM journal on mathematics of data science Pub Date : 2023-08-11 DOI: 10.1137/22m1519833
Liwei Jiang, Yudong Chen, Lijun Ding
{"title":"Algorithmic Regularization in Model-Free Overparametrized Asymmetric Matrix Factorization","authors":"Liwei Jiang, Yudong Chen, Lijun Ding","doi":"10.1137/22m1519833","DOIUrl":"https://doi.org/10.1137/22m1519833","url":null,"abstract":"We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size, and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the stepsize and initialization compared to prior work. Our theoretical results provide accurate prediction for the behavior of gradient descent, showing good agreement with numerical experiments.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135397126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Probabilistic Registration for Gaussian Process Three-Dimensional Shape Modelling in the Presence of Extensive Missing Data 存在大量缺失数据的高斯过程三维形状建模的概率配准
SIAM journal on mathematics of data science Pub Date : 2023-06-26 DOI: 10.1137/22m1495494
Filipa Valdeira, Ricardo Ferreira, Alessandra Micheletti, Cláudia Soares
{"title":"Probabilistic Registration for Gaussian Process Three-Dimensional Shape Modelling in the Presence of Extensive Missing Data","authors":"Filipa Valdeira, Ricardo Ferreira, Alessandra Micheletti, Cláudia Soares","doi":"10.1137/22m1495494","DOIUrl":"https://doi.org/10.1137/22m1495494","url":null,"abstract":"","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47521245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning Wassmap:用于图像流形学习的Wasserstein等距离映射
SIAM journal on mathematics of data science Pub Date : 2023-06-07 DOI: 10.1137/22m1490053
Keaton Hamm, Nick Henscheid, Shujie Kang
{"title":"Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning","authors":"Keaton Hamm, Nick Henscheid, Shujie Kang","doi":"10.1137/22m1490053","DOIUrl":"https://doi.org/10.1137/22m1490053","url":null,"abstract":"In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds, including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds shows that Wassmap yields good embeddings compared with other global and local techniques.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135363727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Time-Inhomogeneous Diffusion Geometry and Topology 时间非齐次扩散几何与拓扑
SIAM journal on mathematics of data science Pub Date : 2023-05-22 DOI: 10.1137/21m1462945
Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy
{"title":"Time-Inhomogeneous Diffusion Geometry and Topology","authors":"Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy","doi":"10.1137/21m1462945","DOIUrl":"https://doi.org/10.1137/21m1462945","url":null,"abstract":"Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes a diffusion operator and then applies it to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show that diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology, as well as the ambient persistent homology, of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insight into the convergence of diffusion condensation and shows that it provides a link between topological and geometric data analysis.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135287199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximation of Lipschitz Functions Using Deep Spline Neural Networks 利用深度样条神经网络逼近Lipschitz函数
SIAM journal on mathematics of data science Pub Date : 2023-05-15 DOI: 10.1137/22m1504573
Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, Michael Unser
{"title":"Approximation of Lipschitz Functions Using Deep Spline Neural Networks","authors":"Sebastian Neumayer, Alexis Goujon, Pakshal Bohra, Michael Unser","doi":"10.1137/22m1504573","DOIUrl":"https://doi.org/10.1137/22m1504573","url":null,"abstract":"Although Lipschitz-constrained neural networks have many applications in machine learning, the design and training of expressive Lipschitz-constrained networks is very challenging. Since the popular rectified linear-unit networks have provable disadvantages in this setting, we propose using learnable spline activation functions with at least three linear regions instead. We prove that our choice is universal among all componentwise 1-Lipschitz activation functions in the sense that no other weight-constrained architecture can approximate a larger class of functions. Additionally, our choice is at least as expressive as the recently introduced non-componentwise Groupsort activation function for spectral-norm-constrained weights. The theoretical findings of this paper are consistent with previously published numerical results.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136215811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Nonbacktracking Spectral Clustering of Nonuniform Hypergraphs 非均匀超图的非回溯谱聚类
SIAM journal on mathematics of data science Pub Date : 2023-04-26 DOI: 10.1137/22m1494713
Philip Chodrow, Nicole Eikmeier, Jamie Haddock
{"title":"Nonbacktracking Spectral Clustering of Nonuniform Hypergraphs","authors":"Philip Chodrow, Nicole Eikmeier, Jamie Haddock","doi":"10.1137/22m1494713","DOIUrl":"https://doi.org/10.1137/22m1494713","url":null,"abstract":"Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara–Bass type which allows eigenpair computations to take place on a smaller matrix, often enabling faster computation. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation which involves a spectral clustering step again using nonbacktracking operators. We provide proofs related to this algorithm that both formalize and extend several previous results. We pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels in general, supporting these with in-expectation analysis of the eigenpairs of our operators. We perform experiments in real and synthetic data that demonstrate the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136319552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mathematical Principles of Topological and Geometric Data Analysis 拓扑和几何数据分析的数学原理
SIAM journal on mathematics of data science Pub Date : 2023-01-01 DOI: 10.1007/978-3-031-33440-5
Parvaneh Joharinad, J. Jost
{"title":"Mathematical Principles of Topological and Geometric Data Analysis","authors":"Parvaneh Joharinad, J. Jost","doi":"10.1007/978-3-031-33440-5","DOIUrl":"https://doi.org/10.1007/978-3-031-33440-5","url":null,"abstract":"","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78744206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bi-Invariant Dissimilarity Measures for Sample Distributions in Lie Groups 李群中样本分布的双不变不相似测度
SIAM journal on mathematics of data science Pub Date : 2022-11-15 DOI: 10.1137/21m1410373
M. Hanik, H. Hege, C. V. Tycowicz
{"title":"Bi-Invariant Dissimilarity Measures for Sample Distributions in Lie Groups","authors":"M. Hanik, H. Hege, C. V. Tycowicz","doi":"10.1137/21m1410373","DOIUrl":"https://doi.org/10.1137/21m1410373","url":null,"abstract":"","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"4 1","pages":"1223-1249"},"PeriodicalIF":0.0,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85449683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling 基于双随机标度的流形密度和几何的鲁棒推断
SIAM journal on mathematics of data science Pub Date : 2022-09-16 DOI: 10.48550/arXiv.2209.08004
Boris Landa, Xiuyuan Cheng
{"title":"Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling","authors":"Boris Landa, Xiuyuan Cheng","doi":"10.48550/arXiv.2209.08004","DOIUrl":"https://doi.org/10.48550/arXiv.2209.08004","url":null,"abstract":"The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"40 1","pages":"589-614"},"PeriodicalIF":0.0,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82140771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Convergence of a Piggyback-Style Method for the Differentiation of Solutions of Standard Saddle-Point Problems 标准鞍点问题解微分的一种背驮式方法的收敛性
SIAM journal on mathematics of data science Pub Date : 2022-07-14 DOI: 10.1137/21m1455887
L. Bogensperger, A. Chambolle, T. Pock
{"title":"Convergence of a Piggyback-Style Method for the Differentiation of Solutions of Standard Saddle-Point Problems","authors":"L. Bogensperger, A. Chambolle, T. Pock","doi":"10.1137/21m1455887","DOIUrl":"https://doi.org/10.1137/21m1455887","url":null,"abstract":". We analyse a “piggyback”-style method for computing the derivative of a loss which depends on the solution of a convex-concave saddle point problems, with respect to the bilinear term. We attempt to derive guarantees for the algorithm under minimal regularity assumption on the functions. Our final convergence results include possibly nonsmooth objectives. We illustrate the versatility of the proposed piggyback algorithm by learning optimized shearlet transforms, which are a class of popu-lar sparsifying transforms in the field of imaging.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"19 11","pages":"1003-1030"},"PeriodicalIF":0.0,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72405371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信