An online guided tuning approach to run CNN pipelines on edge devices

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI:10.1145/3457388.3458662

Pirah Noor Soomro, M. Abduljabbar, J. Castrillón, M. Pericàs

{"title":"An online guided tuning approach to run CNN pipelines on edge devices","authors":"Pirah Noor Soomro, M. Abduljabbar, J. Castrillón, M. Pericàs","doi":"10.1145/3457388.3458662","DOIUrl":null,"url":null,"abstract":"Modern edge and mobile devices are equipped with powerful computing resources. These are often organized as heterogeneous multi-cores, featuring performance-asymmetric core clusters. This raises the question on how to effectively execute the inference pass of convolutional neural networks (CNN) on such devices. Existing CNN implementations on edge devices leverage offline profiling data to determine a better schedule for CNN applications. This approach requires a time consuming phase of generating a performance profile for each type of representative kernel on various core configurations available on the device, coupled with a search space exploration. We propose an online tuning technique which utilizes compile time hints and online profiling data to generate high throughput CNN pipelines. We explore core heterogeneity and compatible core-layer configurations through an online guided search. Unlike exhaustive search, we adopt an evolutionary approach with a guided starting point in order to find the solution. We show that by pruning and navigating through the complex search space using compile time hints, 79% of the tested configurations turn out to be near-optimal candidates for a throughput maximizing pipeline on NVIDIA Jetson TX2 platform.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457388.3458662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Modern edge and mobile devices are equipped with powerful computing resources. These are often organized as heterogeneous multi-cores, featuring performance-asymmetric core clusters. This raises the question on how to effectively execute the inference pass of convolutional neural networks (CNN) on such devices. Existing CNN implementations on edge devices leverage offline profiling data to determine a better schedule for CNN applications. This approach requires a time consuming phase of generating a performance profile for each type of representative kernel on various core configurations available on the device, coupled with a search space exploration. We propose an online tuning technique which utilizes compile time hints and online profiling data to generate high throughput CNN pipelines. We explore core heterogeneity and compatible core-layer configurations through an online guided search. Unlike exhaustive search, we adopt an evolutionary approach with a guided starting point in order to find the solution. We show that by pruning and navigating through the complex search space using compile time hints, 79% of the tested configurations turn out to be near-optimal candidates for a throughput maximizing pipeline on NVIDIA Jetson TX2 platform.

查看原文本刊更多论文

一种在边缘设备上运行CNN管道的在线指导调谐方法

现代边缘和移动设备配备了强大的计算资源。它们通常被组织为异构多核，具有性能不对称的核心集群。这就提出了如何在这样的设备上有效地执行卷积神经网络(CNN)的推理传递的问题。现有的边缘设备上的CNN实现利用离线分析数据来确定CNN应用程序的更好的时间表。这种方法需要一个耗时的阶段，即在设备上可用的各种核心配置上为每种类型的代表性内核生成性能概要文件，并进行搜索空间探索。我们提出了一种利用编译时间提示和在线分析数据来生成高吞吐量CNN管道的在线调优技术。我们通过在线引导搜索探索核心异质性和兼容核心层配置。与穷举搜索不同，我们采用一种带有引导起点的进化方法来找到解决方案。我们表明，通过使用编译时间提示在复杂的搜索空间中进行修剪和导航，79%的测试配置被证明是NVIDIA Jetson TX2平台上吞吐量最大化管道的接近最佳候选者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 18th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量