Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing Pub Date : 2023-10-10 DOI:10.1109/TETC.2023.3322033

Alessio Burrello;Matteo Risso;Beatrice Alessandra Motetti;Enrico Macii;Luca Benini;Daniele Jahier Pagliari

{"title":"Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices","authors":"Alessio Burrello;Matteo Risso;Beatrice Alessandra Motetti;Enrico Macii;Luca Benini;Daniele Jahier Pagliari","doi":"10.1109/TETC.2023.3322033","DOIUrl":null,"url":null,"abstract":"The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 3","pages":"780-794"},"PeriodicalIF":5.1000,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10278089/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.

查看原文本刊更多论文

利用多种硬件限制增强神经架构搜索，以便在微型物联网设备上部署深度学习模型

依赖于物联网（IoT）设备的计算领域迅速激增，因此迫切需要能够在低功耗设备上运行的高效、准确的深度学习（DL）模型。然而，对于典型的物联网终端节点来说，传统的深度学习模型往往过于复杂和计算密集。为了应对这一挑战，神经架构搜索（NAS）已成为一种流行的设计自动化技术，用于共同优化深度神经网络的准确性和复杂性。然而，现有的 NAS 技术需要多次迭代才能生成符合特定硬件约束条件的网络，例如硬件可用的最大内存或目标应用允许的最大延迟。在这项工作中，我们提出了一种新方法，将多个约束条件纳入所谓的可微分 NAS 优化方法中，这样就能在与单次标准训练相当的时间内，一次性生成一个遵守用户定义的内存和延迟约束条件的模型。我们在五个物联网相关基准（包括 MLPerf Tiny 套件和 Tiny ImageNet）上对所提出的方法进行了评估，结果表明，只需一次搜索，就能将内存和延迟分别减少 87.4% 和 54.2%（根据我们的目标定义），同时确保 TinyML 的最先进手工调谐深度神经网络的准确性毫不逊色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.