One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2021-11-01 DOI:10.1145/3491046

Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, Shaolei Ren

{"title":"One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search","authors":"Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, Shaolei Ren","doi":"10.1145/3491046","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity --- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3491046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity --- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.

查看原文本刊更多论文

一个代理设备就足以实现硬件感知的神经结构搜索

卷积神经网络(cnn)被用于许多现实世界的应用，如基于视觉的自动驾驶和视频内容分析。为了在各种目标设备上运行CNN推理，硬件感知神经结构搜索(NAS)至关重要。高效的硬件感知NAS的一个关键要求是快速评估推理延迟，以便对不同的体系结构进行排序。虽然为每个目标设备构建一个延迟预测器是目前最常用的方法，但这是一个非常耗时的过程，而且在设备种类繁多的情况下缺乏可伸缩性。在这项工作中，我们通过利用延迟单调性来解决可扩展性挑战——不同设备上的架构延迟排名通常是相关的。当存在强延迟单调性时，我们可以在新的目标设备上重用搜索一个代理设备的架构，而不会失去最优性。在缺乏强延迟单调性的情况下，我们提出了一种有效的代理自适应技术来显著提高延迟单调性。最后，我们验证了我们的方法，并在多个主流搜索空间(包括MobileNet-V2、MobileNet-V3、NAS-Bench-201、ProxylessNAS和FBNet)上使用不同平台的设备进行了实验。我们的结果强调，通过只使用一个代理设备，我们可以找到与现有的每设备NAS几乎相同的pareto最优架构，同时避免了为每个设备构建延迟预测器的高昂成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量