卷积神经网络的广义熵稀疏化研究。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2025-08-08 DOI:10.1162/neco.a.21

Tin Barisin;Illia Horenko

{"title":"卷积神经网络的广义熵稀疏化研究。","authors":"Tin Barisin;Illia Horenko","doi":"10.1162/neco.a.21","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem: if the network has N neurons, then there are 2N possibilities to connect them—and therefore 2N possible architectures and 2N Boolean hyperparameters to encode them. Selecting the best possible hyperparameter out of them becomes an Np-hard problem since 2N grows in N faster then any polynomial Np. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pretrained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): on MNIST (LeNet), resulting in sparsity of 55% to 84% and loss in accuracy of just 0.1% to 0.5%, and on CIFAR-10 (VGG-16, ResNet18), resulting in sparsity of 73% to 89% and loss in accuracy of 0.1% to 0.5%.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 9","pages":"1648-1676"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Generalized Entropic Sparsification for Convolutional Neural Networks\",\"authors\":\"Tin Barisin;Illia Horenko\",\"doi\":\"10.1162/neco.a.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem: if the network has N neurons, then there are 2N possibilities to connect them—and therefore 2N possible architectures and 2N Boolean hyperparameters to encode them. Selecting the best possible hyperparameter out of them becomes an Np-hard problem since 2N grows in N faster then any polynomial Np. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pretrained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): on MNIST (LeNet), resulting in sparsity of 55% to 84% and loss in accuracy of just 0.1% to 0.5%, and on CIFAR-10 (VGG-16, ResNet18), resulting in sparsity of 73% to 89% and loss in accuracy of 0.1% to 0.5%.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"37 9\",\"pages\":\"1648-1676\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11180102/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11180102/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

据报道，卷积神经网络（cnn）被过度参数化。寻找最优（最小）和充分的架构是一个np困难问题：如果网络有$N$神经元，那么有2$^{N}$连接它们的可能性-因此有2$^{N}$可能的架构和2$^{N}$布尔超参数来编码它们。从它们中选择最好的超参数变成了一个困难的问题，因为2$^{N}$在$N$中的增长速度比任何多项式$N^{p}$都快。在这里，我们引入了一种基于数学思想的逐层数据驱动的剪枝方法，旨在解决剪枝问题的计算可扩展的熵松弛问题。使用网络熵最小化作为稀疏性约束，从预训练（完整）CNN中找到稀疏子网络。这允许部署具有次线性扩展成本的数值可扩展算法。该方法在几个基准（架构）上进行了验证：在MNIST （LeNet）上，稀疏度为55%至84%，准确度损失仅为0.1%至0.5%；在CIFAR-10 （vgg16, ResNet18）上，稀疏度为73%至89%，准确度损失为0.1%至0.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Generalized Entropic Sparsification for Convolutional Neural Networks

Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem: if the network has N neurons, then there are 2N possibilities to connect them—and therefore 2N possible architectures and 2N Boolean hyperparameters to encode them. Selecting the best possible hyperparameter out of them becomes an Np-hard problem since 2N grows in N faster then any polynomial Np. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pretrained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): on MNIST (LeNet), resulting in sparsity of 55% to 84% and loss in accuracy of just 0.1% to 0.5%, and on CIFAR-10 (VGG-16, ResNet18), resulting in sparsity of 73% to 89% and loss in accuracy of 0.1% to 0.5%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.