{"title":"Fundamental limits of weak learnability in high-dimensional multi-index models","authors":"Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala","doi":"arxiv-2405.15480","DOIUrl":null,"url":null,"abstract":"Multi-index models -- functions which only depend on the covariates through a\nnon-linear transformation of their projection on a subspace -- are a useful\nbenchmark for investigating feature learning with neural networks. This paper\nexamines the theoretical boundaries of learnability in this hypothesis class,\nfocusing particularly on the minimum sample complexity required for weakly\nrecovering their low-dimensional structure with first-order iterative\nalgorithms, in the high-dimensional regime where the number of samples is\n$n=\\alpha d$ is proportional to the covariate dimension $d$. Our findings\nunfold in three parts: (i) first, we identify under which conditions a\n\\textit{trivial subspace} can be learned with a single step of a first-order\nalgorithm for any $\\alpha\\!>\\!0$; (ii) second, in the case where the trivial\nsubspace is empty, we provide necessary and sufficient conditions for the\nexistence of an {\\it easy subspace} consisting of directions that can be\nlearned only above a certain sample complexity $\\alpha\\!>\\!\\alpha_c$. The\ncritical threshold $\\alpha_{c}$ marks the presence of a computational phase\ntransition, in the sense that no efficient iterative algorithm can succeed for\n$\\alpha\\!<\\!\\alpha_c$. In a limited but interesting set of really hard\ndirections -- akin to the parity problem -- $\\alpha_c$ is found to diverge.\nFinally, (iii) we demonstrate that interactions between different directions\ncan result in an intricate hierarchical learning phenomenon, where some\ndirections can be learned sequentially when coupled to easier ones. Our\nanalytical approach is built on the optimality of approximate message-passing\nalgorithms among first-order iterative methods, delineating the fundamental\nlearnability limit across a broad spectrum of algorithms, including neural\nnetworks trained with gradient descent.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.15480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-index models -- functions which only depend on the covariates through a
non-linear transformation of their projection on a subspace -- are a useful
benchmark for investigating feature learning with neural networks. This paper
examines the theoretical boundaries of learnability in this hypothesis class,
focusing particularly on the minimum sample complexity required for weakly
recovering their low-dimensional structure with first-order iterative
algorithms, in the high-dimensional regime where the number of samples is
$n=\alpha d$ is proportional to the covariate dimension $d$. Our findings
unfold in three parts: (i) first, we identify under which conditions a
\textit{trivial subspace} can be learned with a single step of a first-order
algorithm for any $\alpha\!>\!0$; (ii) second, in the case where the trivial
subspace is empty, we provide necessary and sufficient conditions for the
existence of an {\it easy subspace} consisting of directions that can be
learned only above a certain sample complexity $\alpha\!>\!\alpha_c$. The
critical threshold $\alpha_{c}$ marks the presence of a computational phase
transition, in the sense that no efficient iterative algorithm can succeed for
$\alpha\!<\!\alpha_c$. In a limited but interesting set of really hard
directions -- akin to the parity problem -- $\alpha_c$ is found to diverge.
Finally, (iii) we demonstrate that interactions between different directions
can result in an intricate hierarchical learning phenomenon, where some
directions can be learned sequentially when coupled to easier ones. Our
analytical approach is built on the optimality of approximate message-passing
algorithms among first-order iterative methods, delineating the fundamental
learnability limit across a broad spectrum of algorithms, including neural
networks trained with gradient descent.