Quantifying relevance in learning and inference

IF 23.9 1区 物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY
Matteo Marsili , Yasser Roudi
{"title":"Quantifying relevance in learning and inference","authors":"Matteo Marsili ,&nbsp;Yasser Roudi","doi":"10.1016/j.physrep.2022.03.001","DOIUrl":null,"url":null,"abstract":"<div><p>Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high-dimensional and scarce, and prior information on “true” models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of “relevance”. The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf’s law statistics. This identifies samples obeying Zipf’s law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties. This distinctive feature is consistent with the invariance of the classification under coarse graining of the output, which is a desirable property of learning machines. This theoretical framework is corroborated by empirical analysis showing (i) how the concept of relevance can be useful to identify relevant variables in high-dimensional inference and (ii) that widely used machine learning architectures approach reasonably well the ideal limit of optimal learning machines, within the limits of the data with which they are trained.</p></div>","PeriodicalId":404,"journal":{"name":"Physics Reports","volume":"963 ","pages":"Pages 1-43"},"PeriodicalIF":23.9000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics Reports","FirstCategoryId":"4","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0370157322000862","RegionNum":1,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 10

Abstract

Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high-dimensional and scarce, and prior information on “true” models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of “relevance”. The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf’s law statistics. This identifies samples obeying Zipf’s law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties. This distinctive feature is consistent with the invariance of the classification under coarse graining of the output, which is a desirable property of learning machines. This theoretical framework is corroborated by empirical analysis showing (i) how the concept of relevance can be useful to identify relevant variables in high-dimensional inference and (ii) that widely used machine learning architectures approach reasonably well the ideal limit of optimal learning machines, within the limits of the data with which they are trained.

量化学习和推理中的相关性
学习是智能行为的一个显著特征。高通量实验数据和大数据有望为研究细胞、大脑或我们的社会等复杂系统打开新的窗口。然而,人工智能和机器学习令人费解的成功表明,我们对学习的概念理解仍然很差。这些应用程序将统计推断推入了未知的领域,在这些领域中,数据是高维的、稀缺的,而关于“真正的”模型的先验信息即使不是完全缺失,也是很少的。在这里,我们回顾了基于“相关性”概念的理解学习的最新进展。正如我们在这里定义的那样,相关性量化了数据集或学习机器的内部表示包含在数据生成模型上的信息量。这使我们一方面可以定义信息量最大的样本,另一方面可以定义最优的学习机。这是样本和机器的理想极限,在给定的分辨率(或压缩水平)下,包含关于未知生成过程的最大信息量。这两个理想极限在统计意义上都表现出关键特征:信息量最大的样本以幂律频率分布(统计临界性)为特征,而最优学习机则以异常大的敏感性为特征。分辨率(即压缩)和相关性之间的权衡将噪声表示与有损压缩区分开来。它们被一个特殊的点隔开,这个点以齐夫定律统计为特征。这将遵循齐夫定律的样本识别为在最大相关性意义上最优的最压缩无损表示。最优学习机的临界性表现为能级的指数退化,这导致了不寻常的热力学性质。这一显著特征与输出粗粒度下分类的不变性是一致的,这是学习机所期望的特性。这一理论框架得到了实证分析的证实,实证分析显示:(i)相关性概念如何有助于识别高维推理中的相关变量,以及(ii)广泛使用的机器学习架构在训练数据的范围内相当好地接近最优学习机的理想极限。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Physics Reports
Physics Reports 物理-物理:综合
CiteScore
56.10
自引率
0.70%
发文量
102
审稿时长
9.1 weeks
期刊介绍: Physics Reports keeps the active physicist up-to-date on developments in a wide range of topics by publishing timely reviews which are more extensive than just literature surveys but normally less than a full monograph. Each report deals with one specific subject and is generally published in a separate volume. These reviews are specialist in nature but contain enough introductory material to make the main points intelligible to a non-specialist. The reader will not only be able to distinguish important developments and trends in physics but will also find a sufficient number of references to the original literature.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信