{"title":"Intermediate-grained kernel elements pruning with structured sparsity","authors":"","doi":"10.1016/j.neunet.2024.106708","DOIUrl":null,"url":null,"abstract":"<div><p>Neural network pruning provides a promising prospect for the deployment of neural networks on embedded or mobile devices with limited resources. Although current structured strategies are unconstrained by specific hardware architecture in the phase of forward inference, the decline in classification accuracy of structured methods is beyond the tolerance at the level of general pruning rate. This inspires us to develop a technique that satisfies high pruning rate with a small decline in accuracy and has the general nature of structured pruning. In this paper, we propose a new pruning method, namely KEP (Kernel Elements Pruning), to compress deep convolutional neural networks by exploring the significance of elements in each kernel plane and removing unimportant elements. In this method, we apply a controllable regularization penalty to constrain unimportant elements by adding a prior knowledge mask and obtain a compact model. In the calculation procedure of forward inference, we introduce a sparse convolution operation which is different from the sliding window to eliminate invalid zero calculations and verify the effectiveness of the operation for further deployment on FPGA. A massive variety of experiments demonstrate the effectiveness of KEP on two datasets: CIFAR-10 and ImageNet. Specially, with few indexes of non-zero weights introduced, KEP has a significant improvement over the latest structured methods in terms of parameter and float-point operation (FLOPs) reduction, and performs well on large datasets.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024006324","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Neural network pruning provides a promising prospect for the deployment of neural networks on embedded or mobile devices with limited resources. Although current structured strategies are unconstrained by specific hardware architecture in the phase of forward inference, the decline in classification accuracy of structured methods is beyond the tolerance at the level of general pruning rate. This inspires us to develop a technique that satisfies high pruning rate with a small decline in accuracy and has the general nature of structured pruning. In this paper, we propose a new pruning method, namely KEP (Kernel Elements Pruning), to compress deep convolutional neural networks by exploring the significance of elements in each kernel plane and removing unimportant elements. In this method, we apply a controllable regularization penalty to constrain unimportant elements by adding a prior knowledge mask and obtain a compact model. In the calculation procedure of forward inference, we introduce a sparse convolution operation which is different from the sliding window to eliminate invalid zero calculations and verify the effectiveness of the operation for further deployment on FPGA. A massive variety of experiments demonstrate the effectiveness of KEP on two datasets: CIFAR-10 and ImageNet. Specially, with few indexes of non-zero weights introduced, KEP has a significant improvement over the latest structured methods in terms of parameter and float-point operation (FLOPs) reduction, and performs well on large datasets.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.