HW-Flow: A Multi-Abstraction Level HW-CNN Codesign Pruning Methodology

Leibniz Trans. Embed. Syst. Pub Date : 1900-01-01 DOI:10.4230/LITES.8.1.3

M. Vemparala, Nael Fasfous, Alexander Frickenstein, Emanuele Valpreda, Manfredi Camalleri, Qi Zhao, C. Unger, N. Nagaraja, Maurizio Martina, W. Stechele

{"title":"HW-Flow: A Multi-Abstraction Level HW-CNN Codesign Pruning Methodology","authors":"M. Vemparala, Nael Fasfous, Alexander Frickenstein, Emanuele Valpreda, Manfredi Camalleri, Qi Zhao, C. Unger, N. Nagaraja, Maurizio Martina, W. Stechele","doi":"10.4230/LITES.8.1.3","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) have produced unprecedented accuracy for many computer vision problems in the recent past. In power and compute-constrained embedded platforms, deploying modern CNNs can present many challenges. Most CNN architectures do not run in real-time due to the high number of computational operations involved during the inference phase. This emphasizes the role of CNN optimization techniques in early design space exploration. To estimate their efficacy in satisfying the target constraints, existing techniques are either hardware (HW) agnostic, pseudo-HW-aware by considering parameter and operation counts, or HW-aware through inflexible hardware-in-the-loop (HIL) setups. In this work, we introduce HW-Flow, a framework for optimizing and exploring CNN models based on three levels of hardware abstraction: Coarse, Mid and Fine. Through these levels, CNN design and optimization can be iteratively refined towards efficient execution on the target hardware platform. We present HWFlow in the context of CNN pruning by augmenting a reinforcement learning agent with key metrics to understand the influence of its pruning actions on the inference hardware. With 2× reduction in energy and latency, we prune ResNet56, ResNet50, and DeepLabv3 with minimal accuracy degradation on the CIFAR-10, ImageNet, and CityScapes datasets, respectively. 2012 ACM Subject Classification Computing Methodologies → Artificial intelligence","PeriodicalId":376325,"journal":{"name":"Leibniz Trans. Embed. Syst.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Leibniz Trans. Embed. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LITES.8.1.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) have produced unprecedented accuracy for many computer vision problems in the recent past. In power and compute-constrained embedded platforms, deploying modern CNNs can present many challenges. Most CNN architectures do not run in real-time due to the high number of computational operations involved during the inference phase. This emphasizes the role of CNN optimization techniques in early design space exploration. To estimate their efficacy in satisfying the target constraints, existing techniques are either hardware (HW) agnostic, pseudo-HW-aware by considering parameter and operation counts, or HW-aware through inflexible hardware-in-the-loop (HIL) setups. In this work, we introduce HW-Flow, a framework for optimizing and exploring CNN models based on three levels of hardware abstraction: Coarse, Mid and Fine. Through these levels, CNN design and optimization can be iteratively refined towards efficient execution on the target hardware platform. We present HWFlow in the context of CNN pruning by augmenting a reinforcement learning agent with key metrics to understand the influence of its pruning actions on the inference hardware. With 2× reduction in energy and latency, we prune ResNet56, ResNet50, and DeepLabv3 with minimal accuracy degradation on the CIFAR-10, ImageNet, and CityScapes datasets, respectively. 2012 ACM Subject Classification Computing Methodologies → Artificial intelligence

查看原文本刊更多论文

HW-Flow:一种多抽象层次的HW-CNN协同设计剪枝方法

近年来，卷积神经网络(cnn)在许多计算机视觉问题上取得了前所未有的精度。在功率和计算受限的嵌入式平台中，部署现代cnn可能会带来许多挑战。由于在推理阶段涉及大量的计算操作，大多数CNN架构不能实时运行。这强调了CNN优化技术在早期设计空间探索中的作用。为了评估它们在满足目标约束方面的有效性，现有技术要么是硬件(HW)不可知的，通过考虑参数和操作计数来伪HW感知，要么是通过不灵活的硬件在环(HIL)设置来感知HW。在这项工作中，我们介绍了HW-Flow，这是一个基于三个硬件抽象级别(粗、中、细)优化和探索CNN模型的框架。通过这些层次，CNN的设计和优化可以迭代细化到目标硬件平台上的高效执行。我们在CNN修剪的背景下展示HWFlow，通过增加一个带有关键指标的强化学习代理来理解其修剪动作对推理硬件的影响。在能量和延迟降低2倍的情况下，我们分别在CIFAR-10、ImageNet和cityscape数据集上对ResNet56、ResNet50和DeepLabv3进行了最小精度降低的修剪。2012 ACM学科分类计算方法→人工智能

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Leibniz Trans. Embed. Syst.

自引率

0.00%

发文量