Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin
{"title":"Efficient Distortion-Minimized Layerwise Pruning","authors":"Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin","doi":"10.1109/TPAMI.2025.3586418","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a post-training pruning framework that jointly optimizes layerwise pruning to minimize model output distortion. Through theoretical and empirical analysis, we discover an important additivity property of output distortion from pruning weights/channels in DNNs. Leveraging this property, we reformulate pruning optimization as a combinatorial problem and solve it with dynamic programming, achieving linear time complexity and making the algorithm very fast on CPUs. Furthermore, we optimize additivity-derived distortions using Hessian-based Taylor approximation to enhance pruning efficiency, accompanied by fine-grained complexity reduction techniques. Our method is evaluated on various DNN architectures, including CNNs, ViTs, and object detectors, and on vision tasks such as image classification on CIFAR-10 and ImageNet, and 3D object detection and various datasets. We achieve SoTA with significant FLOPs reductions without accuracy loss. Specifically, on CIFAR-10, we achieve up to <inline-formula><tex-math>$27.9\\times$</tex-math></inline-formula>, <inline-formula><tex-math>$29.2\\times$</tex-math></inline-formula>, and <inline-formula><tex-math>$14.9\\times$</tex-math></inline-formula> FLOPs reductions on ResNet-32, VGG-16, and DenseNet-121, respectively. On ImageNet, we observe no accuracy loss with <inline-formula><tex-math>$1.69\\times$</tex-math></inline-formula> and <inline-formula><tex-math>$2\\times$</tex-math></inline-formula> FLOPs reductions on ResNet-50 and DeiT-Base, respectively. For 3D object detection, we achieve <inline-formula><tex-math>$\\mathbf {3.89}\\times, \\mathbf {3.72}\\times$</tex-math></inline-formula> FLOPs reductions on CenterPoint and PVRCNN models. These results demonstrate the effectiveness and practicality of our approach for improving model performance through layer-adaptive weight pruning.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9298-9315"},"PeriodicalIF":18.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11072282/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a post-training pruning framework that jointly optimizes layerwise pruning to minimize model output distortion. Through theoretical and empirical analysis, we discover an important additivity property of output distortion from pruning weights/channels in DNNs. Leveraging this property, we reformulate pruning optimization as a combinatorial problem and solve it with dynamic programming, achieving linear time complexity and making the algorithm very fast on CPUs. Furthermore, we optimize additivity-derived distortions using Hessian-based Taylor approximation to enhance pruning efficiency, accompanied by fine-grained complexity reduction techniques. Our method is evaluated on various DNN architectures, including CNNs, ViTs, and object detectors, and on vision tasks such as image classification on CIFAR-10 and ImageNet, and 3D object detection and various datasets. We achieve SoTA with significant FLOPs reductions without accuracy loss. Specifically, on CIFAR-10, we achieve up to $27.9\times$, $29.2\times$, and $14.9\times$ FLOPs reductions on ResNet-32, VGG-16, and DenseNet-121, respectively. On ImageNet, we observe no accuracy loss with $1.69\times$ and $2\times$ FLOPs reductions on ResNet-50 and DeiT-Base, respectively. For 3D object detection, we achieve $\mathbf {3.89}\times, \mathbf {3.72}\times$ FLOPs reductions on CenterPoint and PVRCNN models. These results demonstrate the effectiveness and practicality of our approach for improving model performance through layer-adaptive weight pruning.