One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, Shaolei Ren
{"title":"One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search","authors":"Bingqian Lu, Jianyi Yang, Weiwen Jiang, Yiyu Shi, Shaolei Ren","doi":"10.1145/3491046","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity --- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3491046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity --- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device.
一个代理设备就足以实现硬件感知的神经结构搜索
卷积神经网络(cnn)被用于许多现实世界的应用,如基于视觉的自动驾驶和视频内容分析。为了在各种目标设备上运行CNN推理,硬件感知神经结构搜索(NAS)至关重要。高效的硬件感知NAS的一个关键要求是快速评估推理延迟,以便对不同的体系结构进行排序。虽然为每个目标设备构建一个延迟预测器是目前最常用的方法,但这是一个非常耗时的过程,而且在设备种类繁多的情况下缺乏可伸缩性。在这项工作中,我们通过利用延迟单调性来解决可扩展性挑战——不同设备上的架构延迟排名通常是相关的。当存在强延迟单调性时,我们可以在新的目标设备上重用搜索一个代理设备的架构,而不会失去最优性。在缺乏强延迟单调性的情况下,我们提出了一种有效的代理自适应技术来显著提高延迟单调性。最后,我们验证了我们的方法,并在多个主流搜索空间(包括MobileNet-V2、MobileNet-V3、NAS-Bench-201、ProxylessNAS和FBNet)上使用不同平台的设备进行了实验。我们的结果强调,通过只使用一个代理设备,我们可以找到与现有的每设备NAS几乎相同的pareto最优架构,同时避免了为每个设备构建延迟预测器的高昂成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信