An Information Theory-Inspired Strategy for Automated Network Pruning

IF 9.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2025-05-12 DOI:10.1007/s11263-025-02437-z

Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji

{"title":"An Information Theory-Inspired Strategy for Automated Network Pruning","authors":"Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji","doi":"10.1007/s11263-025-02437-z","DOIUrl":null,"url":null,"abstract":"<p>Despite superior performance achieved on many computer vision tasks, deep neural networks demand high computing power and memory footprint. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance, which lacks influence on the generalization error. In this paper we propose an information theory-inspired strategy for automated network pruning. The principle behind our method is the information bottleneck theory. Concretely, we introduce a new theorem to illustrate that the hidden representation should compress information with each other to achieve a better generalization. In this way, we further introduce the normalized Hilbert-Schmidt Independence Criterion on network activations as a stable and generalized indicator to construct layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method within a few seconds. We also provide rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression trade-offs compared to the state-of-the-art compression algorithms. For instance, on ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are available at https://github.com/MAC-AutoML/ITPruner.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"74 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02437-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Despite superior performance achieved on many computer vision tasks, deep neural networks demand high computing power and memory footprint. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance, which lacks influence on the generalization error. In this paper we propose an information theory-inspired strategy for automated network pruning. The principle behind our method is the information bottleneck theory. Concretely, we introduce a new theorem to illustrate that the hidden representation should compress information with each other to achieve a better generalization. In this way, we further introduce the normalized Hilbert-Schmidt Independence Criterion on network activations as a stable and generalized indicator to construct layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method within a few seconds. We also provide rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression trade-offs compared to the state-of-the-art compression algorithms. For instance, on ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are available at https://github.com/MAC-AutoML/ITPruner.

查看原文本刊更多论文

一种基于信息理论的网络自动修剪策略

尽管在许多计算机视觉任务上取得了卓越的性能，但深度神经网络需要高计算能力和内存占用。大多数现有的网络修剪方法需要耗费大量人力和计算资源，特别是当约束条件发生变化时。当模型需要部署在广泛的设备上时，这实际上限制了模型压缩的应用。此外，现有的方法仍然受到理论指导缺失的挑战，这对泛化误差缺乏影响。本文提出了一种基于信息理论的网络自动修剪策略。我们的方法背后的原理是信息瓶颈理论。具体来说，我们引入了一个新的定理来说明隐藏表示应该相互压缩信息以达到更好的泛化。这样，我们进一步引入网络激活的归一化Hilbert-Schmidt独立准则作为一个稳定的广义指标来构造层重要性。在给定资源约束条件下，将HSIC指标与约束条件相结合，将体系结构搜索问题转化为具有二次约束条件的线性规划问题。这种问题可以用凸优化方法在几秒钟内轻松解决。我们还提供了严格的证明，表明优化归一化HSIC同时最小化了不同层之间的互信息。与最先进的压缩算法相比，我们的方法在没有任何搜索过程的情况下实现了更好的压缩权衡。例如，在ResNet-50上，我们实现了45.3%-FLOPs的减少，在ImageNet上实现了75.75的top-1精度。代码可在https://github.com/MAC-AutoML/ITPruner上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.