局部核重整化作为一种超参数化卷积神经网络特征学习机制

IF 14.7 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Nature Communications Pub Date : 2025-01-10 DOI:10.1038/s41467-024-55229-3

R. Aiudi, R. Pacelli, P. Baglioni, A. Vezzani, R. Burioni, P. Rotondo

{"title":"局部核重整化作为一种超参数化卷积神经网络特征学习机制","authors":"R. Aiudi, R. Pacelli, P. Baglioni, A. Vezzani, R. Burioni, P. Rotondo","doi":"10.1038/s41467-024-55229-3","DOIUrl":null,"url":null,"abstract":"<p>Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"91 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks\",\"authors\":\"R. Aiudi, R. Pacelli, P. Baglioni, A. Vezzani, R. Burioni, P. Rotondo\",\"doi\":\"10.1038/s41467-024-55229-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.</p>\",\"PeriodicalId\":19066,\"journal\":{\"name\":\"Nature Communications\",\"volume\":\"91 1\",\"pages\":\"\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Communications\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41467-024-55229-3\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-55229-3","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

经验证据表明，在大多数计算机视觉任务中，无限宽度限制（惰性训练）下的全连接神经网络最终表现优于有限宽度的神经网络；另一方面，具有卷积层的现代架构通常在有限宽度范围内实现最佳性能。在这项工作中，我们提出了一个理论框架，为单隐藏层网络中的这些差异提供了基本原理；我们在所谓的比例极限中为一个具有一个卷积隐藏层的架构导出了一个有效的动作，并将其与全连接网络的结果进行了比较。值得注意的是，我们确定了一种完全不同形式的核重整化：而全连接架构的核只是通过单个标量参数进行全局重整化，卷积核则进行局部重整化，这意味着网络可以选择局部组件，这些组件将以数据依赖的方式有助于最终预测。这一发现强调了一种简单的特征学习机制，它可以发生在过参数化的浅卷积神经网络中，但不能发生在浅全连接架构或没有权值共享的局部连接神经网络中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks

查看原文本刊更多论文

Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks

Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Communications Biological Science Disciplines-

CiteScore

24.90

自引率

2.40%

发文量

6928

审稿时长

3.7 months

期刊介绍： Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.