SIAM journal on mathematics of data science最新文献_第10页

FANOK: Knockoffs in Linear Time FANOK:线性时间的仿冒品

SIAM journal on mathematics of data science Pub Date : 2020-06-15 DOI: 10.1137/20m1363698

Armin Askari, Quentin Rebjock, A. d’Aspremont, L. Ghaoui

引用次数: 2

Overparameterization and Generalization Error: Weighted Trigonometric Interpolation 超参数化与广义误差：加权三角插值

SIAM journal on mathematics of data science Pub Date : 2020-06-15 DOI: 10.1137/21m1390955

Yuege Xie, H. Chou, H. Rauhut, Rachel A. Ward

引用次数: 3

The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty 修剪套索:稀疏恢复保证和广义软最小惩罚的实际优化

SIAM journal on mathematics of data science Pub Date : 2020-05-18 DOI: 10.1137/20M1330634

Tal Amir, R. Basri, B. Nadler

{"title":"The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty","authors":"Tal Amir, R. Basri, B. Nadler","doi":"10.1137/20M1330634","DOIUrl":"https://doi.org/10.1137/20M1330634","url":null,"abstract":"We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${bf x}inmathbb{R}^d$ that minimizes the $ell_2$ residual $lVert A{bf x}-{bf y} rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $textit{trimmed lasso}$, defined as the $ell_1$-norm of ${bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"47 10 1","pages":"900-929"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80619433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Spectral Discovery of Jointly Smooth Features for Multimodal Data 多模态数据联合光滑特征的光谱发现

SIAM journal on mathematics of data science Pub Date : 2020-04-09 DOI: 10.1137/21M141590X

Or Yair, Felix Dietrich, Rotem Mulayoff, R. Talmon, I. Kevrekidis

引用次数: 6

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis 时间差异学习是最优的吗?依赖实例的分析

SIAM journal on mathematics of data science Pub Date : 2020-03-16 DOI: 10.1137/20m1331524

K. Khamaru, A. Pananjady, Feng Ruan, M. Wainwright, Michael I. Jordan

引用次数: 39

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks 扩散状态距离:多时间分析、快速算法及在生物网络中的应用

SIAM journal on mathematics of data science Pub Date : 2020-03-07 DOI: 10.1137/20M1324089

L. Cowen, K. Devkota, Xiaozhe Hu, James M. Murphy, Kaiyi Wu

引用次数: 5

Diversity sampling is an implicit regularization for kernel methods 多样性采样是核方法的隐式正则化

SIAM journal on mathematics of data science Pub Date : 2020-02-20 DOI: 10.1137/20M1320031

M. Fanuel, J. Schreurs, J. Suykens

{"title":"Diversity sampling is an implicit regularization for kernel methods","authors":"M. Fanuel, J. Schreurs, J. Suykens","doi":"10.1137/20M1320031","DOIUrl":"https://doi.org/10.1137/20M1320031","url":null,"abstract":"Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nystrom method and preconditioning techniques. The Nystrom approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"9 1","pages":"280-297"},"PeriodicalIF":0.0,"publicationDate":"2020-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85923398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization 随机梯度法在非凸优化中的自适应性

SIAM journal on mathematics of data science Pub Date : 2020-02-13 DOI: 10.1137/21m1394308

Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan

引用次数: 18

Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis 协同MARL的q -学习平均域控制:收敛性和复杂性分析

SIAM journal on mathematics of data science Pub Date : 2020-02-10 DOI: 10.1137/20m1360700

Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu

引用次数: 37

Persistent Cohomology for Data With Multicomponent Heterogeneous Information. 多分量异构信息数据的持久上同调。

SIAM journal on mathematics of data science Pub Date : 2020-01-01 Epub Date: 2020-05-19 DOI: 10.1137/19m1272226

Zixuan Cang, Guo-Wei Wei

{"title":"Persistent Cohomology for Data With Multicomponent Heterogeneous Information.","authors":"Zixuan Cang, Guo-Wei Wei","doi":"10.1137/19m1272226","DOIUrl":"https://doi.org/10.1137/19m1272226","url":null,"abstract":"<p><p>Persistent homology is a powerful tool for characterizing the topology of a data set at various geometric scales. When applied to the description of molecular structures, persistent homology can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants. However, in addition to the geometric information, there is a wide variety of nongeometric information of molecular structures, such as element types, atomic partial charges, atomic pairwise interactions, and electrostatic potential functions, that is not described by persistent homology. Although element-specific homology and electrostatic persistent homology can encode some nongeometric information into geometry based topological invariants, it is desirable to have a mathematical paradigm to systematically embed both geometric and nongeometric information, i.e., multicomponent heterogeneous information, into unified topological representations. To this end, we propose a persistent cohomology based framework for the enriched representation of data. In our framework, nongeometric information can either be distributed globally or reside locally on the datasets in the geometric sense and can be properly defined on topological spaces, i.e., simplicial complexes. Using the proposed persistent cohomology based framework, enriched barcodes are extracted from datasets to represent heterogeneous information. We consider a variety of datasets to validate the present formulation and illustrate the usefulness of the proposed method based on persistent cohomology. It is found that the proposed framework outperforms or at least matches the state-of-the-art methods in the protein-ligand binding affinity prediction from massive biomolecular datasets without resorting to any deep learning formulation.</p>","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"2 2","pages":"396-418"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/19m1272226","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39150984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12