An Information Theory-Inspired Strategy for Automated Network Pruning

IF 11.6 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji
{"title":"An Information Theory-Inspired Strategy for Automated Network Pruning","authors":"Xiawu Zheng, Yuexiao Ma, Teng Xi, Gang Zhang, Errui Ding, Yuchao Li, Jie Chen, Yonghong Tian, Rongrong Ji","doi":"10.1007/s11263-025-02437-z","DOIUrl":null,"url":null,"abstract":"<p>Despite superior performance achieved on many computer vision tasks, deep neural networks demand high computing power and memory footprint. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance, which lacks influence on the generalization error. In this paper we propose an information theory-inspired strategy for automated network pruning. The principle behind our method is the information bottleneck theory. Concretely, we introduce a new theorem to illustrate that the hidden representation should compress information with each other to achieve a better generalization. In this way, we further introduce the normalized Hilbert-Schmidt Independence Criterion on network activations as a stable and generalized indicator to construct layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method within a few seconds. We also provide rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression trade-offs compared to the state-of-the-art compression algorithms. For instance, on ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are available at https://github.com/MAC-AutoML/ITPruner.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"74 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02437-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Despite superior performance achieved on many computer vision tasks, deep neural networks demand high computing power and memory footprint. Most existing network pruning methods require laborious human efforts and prohibitive computation resources, especially when the constraints are changed. This practically limits the application of model compression when the model needs to be deployed on a wide range of devices. Besides, existing methods are still challenged by the missing theoretical guidance, which lacks influence on the generalization error. In this paper we propose an information theory-inspired strategy for automated network pruning. The principle behind our method is the information bottleneck theory. Concretely, we introduce a new theorem to illustrate that the hidden representation should compress information with each other to achieve a better generalization. In this way, we further introduce the normalized Hilbert-Schmidt Independence Criterion on network activations as a stable and generalized indicator to construct layer importance. When a certain resource constraint is given, we integrate the HSIC indicator with the constraint to transform the architecture search problem into a linear programming problem with quadratic constraints. Such a problem is easily solved by a convex optimization method within a few seconds. We also provide rigorous proof to reveal that optimizing the normalized HSIC simultaneously minimizes the mutual information between different layers. Without any search process, our method achieves better compression trade-offs compared to the state-of-the-art compression algorithms. For instance, on ResNet-50, we achieve a 45.3%-FLOPs reduction, with a 75.75 top-1 accuracy on ImageNet. Codes are available at https://github.com/MAC-AutoML/ITPruner.

一种基于信息理论的网络自动修剪策略
尽管在许多计算机视觉任务上取得了卓越的性能,但深度神经网络需要高计算能力和内存占用。大多数现有的网络修剪方法需要耗费大量人力和计算资源,特别是当约束条件发生变化时。当模型需要部署在广泛的设备上时,这实际上限制了模型压缩的应用。此外,现有的方法仍然受到理论指导缺失的挑战,这对泛化误差缺乏影响。本文提出了一种基于信息理论的网络自动修剪策略。我们的方法背后的原理是信息瓶颈理论。具体来说,我们引入了一个新的定理来说明隐藏表示应该相互压缩信息以达到更好的泛化。这样,我们进一步引入网络激活的归一化Hilbert-Schmidt独立准则作为一个稳定的广义指标来构造层重要性。在给定资源约束条件下,将HSIC指标与约束条件相结合,将体系结构搜索问题转化为具有二次约束条件的线性规划问题。这种问题可以用凸优化方法在几秒钟内轻松解决。我们还提供了严格的证明,表明优化归一化HSIC同时最小化了不同层之间的互信息。与最先进的压缩算法相比,我们的方法在没有任何搜索过程的情况下实现了更好的压缩权衡。例如,在ResNet-50上,我们实现了45.3%-FLOPs的减少,在ImageNet上实现了75.75的top-1精度。代码可在https://github.com/MAC-AutoML/ITPruner上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Computer Vision
International Journal of Computer Vision 工程技术-计算机:人工智能
CiteScore
29.80
自引率
2.10%
发文量
163
审稿时长
6 months
期刊介绍: The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信