Structure discovery in PAC-learning by random projections

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Pub Date : 2024-03-26 DOI:10.1007/s10994-024-06531-0

{"title":"Structure discovery in PAC-learning by random projections","authors":"","doi":"10.1007/s10994-024-06531-0","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06531-0","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.

查看原文本刊更多论文

通过随机投影发现 PAC-learning 中的结构

摘要高维学习一般都是数据饥渴型的；然而，许多自然数据源和现实世界中的学习问题都具有一些隐藏的低复杂性结构，允许从相对较小的样本量中进行有效学习。我们感兴趣的一般问题是，当特定问题的先验知识不足时，如何发现和利用这种隐藏的良性特征。在这项工作中，我们通过随机投影揭示结构的能力来解决这个问题。通过引入压缩失真和压缩复杂性的概念，我们从这个角度研究了压缩学习和高维学习。我们在不可知论环境中给出了用户友好的 PAC 界值，这些界值是用这些量来表述的。然后，我们在几个特定学习问题的实例中实例化了这些量，展示了它们发现可解释结构特征的能力，这些特征使得这些问题的高维实例可以在随机线性子空间中很好地近似求解。在所考虑的示例中，这些特征类似于我们熟悉的一些良性特征，例如边际、边际分布、内在维度、数据协方差的频谱衰减或参数规范，而我们的压缩失真和压缩复杂性的一般概念有助于统一这些特征，并可用于发现其他 PAC 可学习问题的良性结构特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.