嵌入式设备上深度神经网络的推理时间缩减:一个案例研究

2022 25th Euromicro Conference on Digital System Design (DSD) Pub Date : 2022-08-01 DOI:10.1109/DSD57027.2022.00036

Isma-Ilou Sadou, Seyed Morteza Nabavinejad, Zhonghai Lu, Masoumeh Ebrahimi

{"title":"嵌入式设备上深度神经网络的推理时间缩减:一个案例研究","authors":"Isma-Ilou Sadou, Seyed Morteza Nabavinejad, Zhonghai Lu, Masoumeh Ebrahimi","doi":"10.1109/DSD57027.2022.00036","DOIUrl":null,"url":null,"abstract":"From object detection to semantic segmentation, deep learning has achieved many groundbreaking results in recent years. However, due to the increasing complexity, the execution of neural networks on embedded platforms is greatly hindered. This has motivated the development of several neural network minimisation techniques, amongst which pruning has gained a lot of focus. In this work, we perform a case study on a series of methods with the goal of finding a small model that could run fast on embedded devices. First, we suggest a simple, but effective, ranking criterion for filter pruning called Mean Weight. Then, we combine this new criterion with a threshold-aware layer-sensitive filter pruning method, called T-sensitive pruning, to gain high accuracy. Further, the pruning algorithm follows a structured filter pruning approach that removes all selected filters and their dependencies from the DNN model, leading to less computations, and thus low inference time in lower-end CPUs. To validate the effectiveness of the proposed method, we perform experiments on three different datasets (with 3, 101, and 1000 classes) and two different deep neural networks (i.e., SICK-Net and MobileNet V1). We have obtained speedups of up to 13x on lower-end CPUs (Armv8) with less than 1% drop in accuracy. This satisfies the goal of transferring deep neural networks to embedded hardware while attaining a good trade-off between inference time and accuracy.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Inference Time Reduction of Deep Neural Networks on Embedded Devices: A Case Study\",\"authors\":\"Isma-Ilou Sadou, Seyed Morteza Nabavinejad, Zhonghai Lu, Masoumeh Ebrahimi\",\"doi\":\"10.1109/DSD57027.2022.00036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"From object detection to semantic segmentation, deep learning has achieved many groundbreaking results in recent years. However, due to the increasing complexity, the execution of neural networks on embedded platforms is greatly hindered. This has motivated the development of several neural network minimisation techniques, amongst which pruning has gained a lot of focus. In this work, we perform a case study on a series of methods with the goal of finding a small model that could run fast on embedded devices. First, we suggest a simple, but effective, ranking criterion for filter pruning called Mean Weight. Then, we combine this new criterion with a threshold-aware layer-sensitive filter pruning method, called T-sensitive pruning, to gain high accuracy. Further, the pruning algorithm follows a structured filter pruning approach that removes all selected filters and their dependencies from the DNN model, leading to less computations, and thus low inference time in lower-end CPUs. To validate the effectiveness of the proposed method, we perform experiments on three different datasets (with 3, 101, and 1000 classes) and two different deep neural networks (i.e., SICK-Net and MobileNet V1). We have obtained speedups of up to 13x on lower-end CPUs (Armv8) with less than 1% drop in accuracy. This satisfies the goal of transferring deep neural networks to embedded hardware while attaining a good trade-off between inference time and accuracy.\",\"PeriodicalId\":211723,\"journal\":{\"name\":\"2022 25th Euromicro Conference on Digital System Design (DSD)\",\"volume\":\"192 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 25th Euromicro Conference on Digital System Design (DSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD57027.2022.00036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD57027.2022.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

从目标检测到语义分割，近年来深度学习取得了许多突破性的成果。然而，由于神经网络的复杂性不断增加，极大地阻碍了神经网络在嵌入式平台上的执行。这推动了几种神经网络最小化技术的发展，其中修剪获得了很多关注。在这项工作中，我们对一系列方法进行了案例研究，目的是找到一个可以在嵌入式设备上快速运行的小型模型。首先，我们提出了一个简单但有效的过滤器修剪排序标准，称为平均权重。然后，我们将这一新准则与一种阈值感知的层敏感滤波器剪枝方法(称为t敏感剪枝)相结合，以获得较高的剪枝精度。此外，修剪算法遵循结构化的过滤器修剪方法，从DNN模型中删除所有选定的过滤器及其依赖关系，从而减少计算量，从而降低低端cpu的推理时间。为了验证所提出方法的有效性，我们在三个不同的数据集(3,101和1000类)和两个不同的深度神经网络(即SICK-Net和MobileNet V1)上进行了实验。我们在低端cpu (Armv8)上获得了高达13倍的速度提升，精度下降不到1%。这满足了将深度神经网络转移到嵌入式硬件的目标，同时在推理时间和精度之间实现了良好的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Inference Time Reduction of Deep Neural Networks on Embedded Devices: A Case Study

From object detection to semantic segmentation, deep learning has achieved many groundbreaking results in recent years. However, due to the increasing complexity, the execution of neural networks on embedded platforms is greatly hindered. This has motivated the development of several neural network minimisation techniques, amongst which pruning has gained a lot of focus. In this work, we perform a case study on a series of methods with the goal of finding a small model that could run fast on embedded devices. First, we suggest a simple, but effective, ranking criterion for filter pruning called Mean Weight. Then, we combine this new criterion with a threshold-aware layer-sensitive filter pruning method, called T-sensitive pruning, to gain high accuracy. Further, the pruning algorithm follows a structured filter pruning approach that removes all selected filters and their dependencies from the DNN model, leading to less computations, and thus low inference time in lower-end CPUs. To validate the effectiveness of the proposed method, we perform experiments on three different datasets (with 3, 101, and 1000 classes) and two different deep neural networks (i.e., SICK-Net and MobileNet V1). We have obtained speedups of up to 13x on lower-end CPUs (Armv8) with less than 1% drop in accuracy. This satisfies the goal of transferring deep neural networks to embedded hardware while attaining a good trade-off between inference time and accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 25th Euromicro Conference on Digital System Design (DSD)

自引率

0.00%

发文量