Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin
{"title":"有效的扭曲最小化分层修剪","authors":"Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin","doi":"10.1109/TPAMI.2025.3586418","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a post-training pruning framework that jointly optimizes layerwise pruning to minimize model output distortion. Through theoretical and empirical analysis, we discover an important additivity property of output distortion from pruning weights/channels in DNNs. Leveraging this property, we reformulate pruning optimization as a combinatorial problem and solve it with dynamic programming, achieving linear time complexity and making the algorithm very fast on CPUs. Furthermore, we optimize additivity-derived distortions using Hessian-based Taylor approximation to enhance pruning efficiency, accompanied by fine-grained complexity reduction techniques. Our method is evaluated on various DNN architectures, including CNNs, ViTs, and object detectors, and on vision tasks such as image classification on CIFAR-10 and ImageNet, and 3D object detection and various datasets. We achieve SoTA with significant FLOPs reductions without accuracy loss. Specifically, on CIFAR-10, we achieve up to <inline-formula><tex-math>$27.9\\times$</tex-math></inline-formula>, <inline-formula><tex-math>$29.2\\times$</tex-math></inline-formula>, and <inline-formula><tex-math>$14.9\\times$</tex-math></inline-formula> FLOPs reductions on ResNet-32, VGG-16, and DenseNet-121, respectively. On ImageNet, we observe no accuracy loss with <inline-formula><tex-math>$1.69\\times$</tex-math></inline-formula> and <inline-formula><tex-math>$2\\times$</tex-math></inline-formula> FLOPs reductions on ResNet-50 and DeiT-Base, respectively. For 3D object detection, we achieve <inline-formula><tex-math>$\\mathbf {3.89}\\times, \\mathbf {3.72}\\times$</tex-math></inline-formula> FLOPs reductions on CenterPoint and PVRCNN models. These results demonstrate the effectiveness and practicality of our approach for improving model performance through layer-adaptive weight pruning.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9298-9315"},"PeriodicalIF":18.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Distortion-Minimized Layerwise Pruning\",\"authors\":\"Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin\",\"doi\":\"10.1109/TPAMI.2025.3586418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a post-training pruning framework that jointly optimizes layerwise pruning to minimize model output distortion. Through theoretical and empirical analysis, we discover an important additivity property of output distortion from pruning weights/channels in DNNs. Leveraging this property, we reformulate pruning optimization as a combinatorial problem and solve it with dynamic programming, achieving linear time complexity and making the algorithm very fast on CPUs. Furthermore, we optimize additivity-derived distortions using Hessian-based Taylor approximation to enhance pruning efficiency, accompanied by fine-grained complexity reduction techniques. Our method is evaluated on various DNN architectures, including CNNs, ViTs, and object detectors, and on vision tasks such as image classification on CIFAR-10 and ImageNet, and 3D object detection and various datasets. We achieve SoTA with significant FLOPs reductions without accuracy loss. Specifically, on CIFAR-10, we achieve up to <inline-formula><tex-math>$27.9\\\\times$</tex-math></inline-formula>, <inline-formula><tex-math>$29.2\\\\times$</tex-math></inline-formula>, and <inline-formula><tex-math>$14.9\\\\times$</tex-math></inline-formula> FLOPs reductions on ResNet-32, VGG-16, and DenseNet-121, respectively. On ImageNet, we observe no accuracy loss with <inline-formula><tex-math>$1.69\\\\times$</tex-math></inline-formula> and <inline-formula><tex-math>$2\\\\times$</tex-math></inline-formula> FLOPs reductions on ResNet-50 and DeiT-Base, respectively. For 3D object detection, we achieve <inline-formula><tex-math>$\\\\mathbf {3.89}\\\\times, \\\\mathbf {3.72}\\\\times$</tex-math></inline-formula> FLOPs reductions on CenterPoint and PVRCNN models. These results demonstrate the effectiveness and practicality of our approach for improving model performance through layer-adaptive weight pruning.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 10\",\"pages\":\"9298-9315\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11072282/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11072282/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we propose a post-training pruning framework that jointly optimizes layerwise pruning to minimize model output distortion. Through theoretical and empirical analysis, we discover an important additivity property of output distortion from pruning weights/channels in DNNs. Leveraging this property, we reformulate pruning optimization as a combinatorial problem and solve it with dynamic programming, achieving linear time complexity and making the algorithm very fast on CPUs. Furthermore, we optimize additivity-derived distortions using Hessian-based Taylor approximation to enhance pruning efficiency, accompanied by fine-grained complexity reduction techniques. Our method is evaluated on various DNN architectures, including CNNs, ViTs, and object detectors, and on vision tasks such as image classification on CIFAR-10 and ImageNet, and 3D object detection and various datasets. We achieve SoTA with significant FLOPs reductions without accuracy loss. Specifically, on CIFAR-10, we achieve up to $27.9\times$, $29.2\times$, and $14.9\times$ FLOPs reductions on ResNet-32, VGG-16, and DenseNet-121, respectively. On ImageNet, we observe no accuracy loss with $1.69\times$ and $2\times$ FLOPs reductions on ResNet-50 and DeiT-Base, respectively. For 3D object detection, we achieve $\mathbf {3.89}\times, \mathbf {3.72}\times$ FLOPs reductions on CenterPoint and PVRCNN models. These results demonstrate the effectiveness and practicality of our approach for improving model performance through layer-adaptive weight pruning.