针对卷积神经网络的硬件感知进化可解释滤波器修剪

IF 0.9 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

International Journal of Parallel Programming Pub Date : 2024-02-22 DOI:10.1007/s10766-024-00760-5

Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig

{"title":"针对卷积神经网络的硬件感知进化可解释滤波器修剪","authors":"Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig","doi":"10.1007/s10766-024-00760-5","DOIUrl":null,"url":null,"abstract":"Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the \\(\\ell ^1\\)-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"819 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks\",\"authors\":\"Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig\",\"doi\":\"10.1007/s10766-024-00760-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the \\\\(\\\\ell ^1\\\\)-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.\",\"PeriodicalId\":14313,\"journal\":{\"name\":\"International Journal of Parallel Programming\",\"volume\":\"819 1\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Parallel Programming\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10766-024-00760-5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Programming","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10766-024-00760-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（CNN）的滤波器剪枝是一种常用技术，可有效减少内存占用、算术运算次数，从而缩短推理时间。最近的剪枝方法还考虑了部署 CNN 的目标设备（即图形处理单元），以减少实际推理时间。然而，一些简单的指标，如 \(\ell ^1\)-norm，被用于决定哪些滤波器需要剪枝。在这项工作中，我们提出了一种硬件感知技术，用于探索可能的滤波器剪枝配置的广阔多目标设计空间。我们的方法不仅结合了目标设备，还结合了可解释人工智能技术，用于排序和决定修剪哪些滤波器。对于每一层，要剪枝的滤波器数量都要进行优化，目标是最大限度地减少 CNN 的推理时间和错误率。实验结果表明，与最先进的 ABCPruner 相比，我们的方法可将 CIFAR-10 数据集上的 VGG-16 和 ILSVRC-2012 数据集上的 ResNet-18 的推理时间分别加快 1.40 倍和 1.30 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks

查看原文本刊更多论文

Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks

Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the \(\ell ^1\)-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Parallel Programming 工程技术-计算机：理论方法

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： International Journal of Parallel Programming is a forum for the publication of peer-reviewed, high-quality original papers in the computer and information sciences, focusing specifically on programming aspects of parallel computing systems. Such systems are characterized by the coexistence over time of multiple coordinated activities. The journal publishes both original research and survey papers. Fields of interest include: linguistic foundations, conceptual frameworks, high-level languages, evaluation methods, implementation techniques, programming support systems, pragmatic considerations, architectural characteristics, software engineering aspects, advances in parallel algorithms, performance studies, and application studies.