在微控制器上设计和部署微型深度神经网络的实用框架

2024 IEEE International Conference on Consumer Electronics (ICCE) Pub Date : 2024-01-06 DOI:10.1109/ICCE59016.2024.10444435

Brenda Zhuang, Danilo Pau

{"title":"在微控制器上设计和部署微型深度神经网络的实用框架","authors":"Brenda Zhuang, Danilo Pau","doi":"10.1109/ICCE59016.2024.10444435","DOIUrl":null,"url":null,"abstract":"For many applications, Deep Neural Networks (DNNs) trained on powerful CPUs and GPUs are expected to efficiently perform inference on tiny devices. However, deploying productively un-constrained complex models to microcontrollers (MCUs) remains a time-consuming task. In this paper, a comprehensive methodology is presented that combines advanced optimization techniques in hyperparameter search, model compression, and deployability evaluation using benchmark data.MCUs typically have low-power processors, limited embedded RAM memory and FLASH storage, providing orders of magnitude fewer computational resources than what cloud assets offer. Designing DNNs for such platforms requires effective strategies to balance high accuracy performance with low memory usage and inference latency. To address this challenge, Bayesian optimization has been applied, a powerful complexity-bounded technique, to hyperparameter tuning to select tiny model architecture candidates. Several pruning and quantization methods have been developed to compress all the models and evaluated the numerical performance after compression. Additionally, cloud-based deployment tools have been utilized to iteratively validate the on-device memory and latency performance on off-the-shelf MCUs. Through evaluating the benchmarks against the stringent requirements of tiny devices at the edge, practical insights have been gained into these models.Multiple image classification applications have been applied on a variety of STM32 MCUs. The practical framework can: a) maintain top-1 classification accuracy within tolerance from the floating-point network after compression; b) reduce memory footprint by at least 4 times; c) reduce inference runtime significantly by avoiding external RAM usage; d) adaptable to many different applications.","PeriodicalId":518694,"journal":{"name":"2024 IEEE International Conference on Consumer Electronics (ICCE)","volume":"88 12","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Practical Framework for Designing and Deploying Tiny Deep Neural Networks on Microcontrollers\",\"authors\":\"Brenda Zhuang, Danilo Pau\",\"doi\":\"10.1109/ICCE59016.2024.10444435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For many applications, Deep Neural Networks (DNNs) trained on powerful CPUs and GPUs are expected to efficiently perform inference on tiny devices. However, deploying productively un-constrained complex models to microcontrollers (MCUs) remains a time-consuming task. In this paper, a comprehensive methodology is presented that combines advanced optimization techniques in hyperparameter search, model compression, and deployability evaluation using benchmark data.MCUs typically have low-power processors, limited embedded RAM memory and FLASH storage, providing orders of magnitude fewer computational resources than what cloud assets offer. Designing DNNs for such platforms requires effective strategies to balance high accuracy performance with low memory usage and inference latency. To address this challenge, Bayesian optimization has been applied, a powerful complexity-bounded technique, to hyperparameter tuning to select tiny model architecture candidates. Several pruning and quantization methods have been developed to compress all the models and evaluated the numerical performance after compression. Additionally, cloud-based deployment tools have been utilized to iteratively validate the on-device memory and latency performance on off-the-shelf MCUs. Through evaluating the benchmarks against the stringent requirements of tiny devices at the edge, practical insights have been gained into these models.Multiple image classification applications have been applied on a variety of STM32 MCUs. The practical framework can: a) maintain top-1 classification accuracy within tolerance from the floating-point network after compression; b) reduce memory footprint by at least 4 times; c) reduce inference runtime significantly by avoiding external RAM usage; d) adaptable to many different applications.\",\"PeriodicalId\":518694,\"journal\":{\"name\":\"2024 IEEE International Conference on Consumer Electronics (ICCE)\",\"volume\":\"88 12\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE International Conference on Consumer Electronics (ICCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE59016.2024.10444435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE59016.2024.10444435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在许多应用中，在强大的 CPU 和 GPU 上训练的深度神经网络 (DNN) 都有望在微型设备上高效执行推理。然而，在微控制器（MCU）上部署高效、无约束的复杂模型仍然是一项耗时的任务。本文介绍了一种综合方法，该方法结合了超参数搜索、模型压缩和使用基准数据进行可部署性评估等方面的先进优化技术。MCU 通常采用低功耗处理器、有限的嵌入式 RAM 内存和闪存（FLASH）存储，提供的计算资源比云资产提供的计算资源少得多。为这类平台设计 DNN 需要有效的策略，在高精度性能与低内存使用率和推理延迟之间取得平衡。为了应对这一挑战，贝叶斯优化技术（一种强大的复杂度限制技术）被应用于超参数调整，以选择微小的候选模型架构。我们开发了几种剪枝和量化方法来压缩所有模型，并评估了压缩后的数值性能。此外，还利用基于云的部署工具在现成的微控制器上反复验证设备内存和延迟性能。通过根据边缘微小设备的严格要求对基准进行评估，对这些模型获得了实用的见解。该实用框架可以：a) 在压缩后浮点网络的容差范围内保持最高分类精度；b) 将内存占用减少至少 4 倍；c) 通过避免使用外部 RAM 大幅缩短推理运行时间；d) 可适应多种不同的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Practical Framework for Designing and Deploying Tiny Deep Neural Networks on Microcontrollers

For many applications, Deep Neural Networks (DNNs) trained on powerful CPUs and GPUs are expected to efficiently perform inference on tiny devices. However, deploying productively un-constrained complex models to microcontrollers (MCUs) remains a time-consuming task. In this paper, a comprehensive methodology is presented that combines advanced optimization techniques in hyperparameter search, model compression, and deployability evaluation using benchmark data.MCUs typically have low-power processors, limited embedded RAM memory and FLASH storage, providing orders of magnitude fewer computational resources than what cloud assets offer. Designing DNNs for such platforms requires effective strategies to balance high accuracy performance with low memory usage and inference latency. To address this challenge, Bayesian optimization has been applied, a powerful complexity-bounded technique, to hyperparameter tuning to select tiny model architecture candidates. Several pruning and quantization methods have been developed to compress all the models and evaluated the numerical performance after compression. Additionally, cloud-based deployment tools have been utilized to iteratively validate the on-device memory and latency performance on off-the-shelf MCUs. Through evaluating the benchmarks against the stringent requirements of tiny devices at the edge, practical insights have been gained into these models.Multiple image classification applications have been applied on a variety of STM32 MCUs. The practical framework can: a) maintain top-1 classification accuracy within tolerance from the floating-point network after compression; b) reduce memory footprint by at least 4 times; c) reduce inference runtime significantly by avoiding external RAM usage; d) adaptable to many different applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2024 IEEE International Conference on Consumer Electronics (ICCE)

自引率

0.00%

发文量