面向fpga加速边缘人工智能应用的快速平台感知神经架构搜索

Proceedings of the International Conference on Research in Adaptive and Convergent Systems Pub Date : 2020-10-13 DOI:10.1145/3400286.3418240

Yi-Chuan Liang, Ying-Chiao Liao, Chen-Ching Lin, Shih-Hao Hung

{"title":"面向fpga加速边缘人工智能应用的快速平台感知神经架构搜索","authors":"Yi-Chuan Liang, Ying-Chiao Liao, Chen-Ching Lin, Shih-Hao Hung","doi":"10.1145/3400286.3418240","DOIUrl":null,"url":null,"abstract":"Neural Architecture Search (NAS) is a technique for finding suitable neural network architecture models for given applications. Previously, such search methods are usually based on reinforcement learning, with a recurrent neural network to generate neural network models. However, most NAS methods aim to find a set of candidates with best cost-performance ratios, e.g. high accuracy and low computing time, based on rough estimates derived from the workload generically. As today's deep learning chips accelerate neural network operations with a variety of hardware tricks such as vectors and low-precision data formats, the estimated metrics derived from generic computing operations such as float-point operations (FLOPS) would be very different from the actual latency, throughput, power consumption, etc., which are highly sensitive to the hardware design and even the software optimization in edge AI applications. Thus, instead of taking a long time to pick and train so called good candidates repeatedly based on unreliable estimates, we propose a NAS framework which accelerates the search process by including the actual performance measurements in the search process. The inclusion of actual measurements enables the proposed NAS framework to find candidates based on correct information and reduce the possibility of selecting wrong candidates and wasting search time on wrong candidates. To illustrate the effectiveness of our framework, we prototyped the framework to work with Intel OpenVINO and Field Programmable Gate Arrays (FPGA) to meet the accuracy and latency required by the user. The framework takes the dataset, accuracy and latency requirements from the user and automatically search for candidates to meet the requirements. Case studies and experimental results are presented in this paper to evaluate the effectiveness of our framework for Edge AI applications in real-time image classification.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Toward Fast Platform-Aware Neural Architecture Search for FPGA-Accelerated Edge AI Applications\",\"authors\":\"Yi-Chuan Liang, Ying-Chiao Liao, Chen-Ching Lin, Shih-Hao Hung\",\"doi\":\"10.1145/3400286.3418240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural Architecture Search (NAS) is a technique for finding suitable neural network architecture models for given applications. Previously, such search methods are usually based on reinforcement learning, with a recurrent neural network to generate neural network models. However, most NAS methods aim to find a set of candidates with best cost-performance ratios, e.g. high accuracy and low computing time, based on rough estimates derived from the workload generically. As today's deep learning chips accelerate neural network operations with a variety of hardware tricks such as vectors and low-precision data formats, the estimated metrics derived from generic computing operations such as float-point operations (FLOPS) would be very different from the actual latency, throughput, power consumption, etc., which are highly sensitive to the hardware design and even the software optimization in edge AI applications. Thus, instead of taking a long time to pick and train so called good candidates repeatedly based on unreliable estimates, we propose a NAS framework which accelerates the search process by including the actual performance measurements in the search process. The inclusion of actual measurements enables the proposed NAS framework to find candidates based on correct information and reduce the possibility of selecting wrong candidates and wasting search time on wrong candidates. To illustrate the effectiveness of our framework, we prototyped the framework to work with Intel OpenVINO and Field Programmable Gate Arrays (FPGA) to meet the accuracy and latency required by the user. The framework takes the dataset, accuracy and latency requirements from the user and automatically search for candidates to meet the requirements. Case studies and experimental results are presented in this paper to evaluate the effectiveness of our framework for Edge AI applications in real-time image classification.\",\"PeriodicalId\":326100,\"journal\":{\"name\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400286.3418240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

神经结构搜索(NAS)是一种为给定应用寻找合适的神经网络结构模型的技术。以前，这种搜索方法通常是基于强化学习，用递归神经网络生成神经网络模型。然而，大多数NAS方法的目标是找到一组具有最佳性价比的候选方法，例如，基于从一般工作负载中得出的粗略估计，高精度和低计算时间。由于当今的深度学习芯片通过各种硬件技巧(如矢量和低精度数据格式)加速神经网络运算，因此浮点运算(FLOPS)等通用计算运算得出的估计指标与实际的延迟、吞吐量、功耗等指标存在很大差异，这些指标对边缘人工智能应用中的硬件设计甚至软件优化都非常敏感。因此，我们提出了一个NAS框架，它通过在搜索过程中包含实际性能测量来加速搜索过程，而不是花费很长时间来根据不可靠的估计反复挑选和训练所谓的优秀候选者。实际测量的包含使所提出的NAS框架能够根据正确的信息找到候选项，并减少选择错误候选项和在错误候选项上浪费搜索时间的可能性。为了说明我们的框架的有效性，我们对框架进行了原型设计，使其与Intel OpenVINO和现场可编程门阵列(FPGA)一起工作，以满足用户所需的精度和延迟。该框架从用户那里获取数据集、精度和延迟要求，并自动搜索满足要求的候选数据。本文给出了案例研究和实验结果，以评估我们的边缘人工智能框架在实时图像分类中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Fast Platform-Aware Neural Architecture Search for FPGA-Accelerated Edge AI Applications

Neural Architecture Search (NAS) is a technique for finding suitable neural network architecture models for given applications. Previously, such search methods are usually based on reinforcement learning, with a recurrent neural network to generate neural network models. However, most NAS methods aim to find a set of candidates with best cost-performance ratios, e.g. high accuracy and low computing time, based on rough estimates derived from the workload generically. As today's deep learning chips accelerate neural network operations with a variety of hardware tricks such as vectors and low-precision data formats, the estimated metrics derived from generic computing operations such as float-point operations (FLOPS) would be very different from the actual latency, throughput, power consumption, etc., which are highly sensitive to the hardware design and even the software optimization in edge AI applications. Thus, instead of taking a long time to pick and train so called good candidates repeatedly based on unreliable estimates, we propose a NAS framework which accelerates the search process by including the actual performance measurements in the search process. The inclusion of actual measurements enables the proposed NAS framework to find candidates based on correct information and reduce the possibility of selecting wrong candidates and wasting search time on wrong candidates. To illustrate the effectiveness of our framework, we prototyped the framework to work with Intel OpenVINO and Field Programmable Gate Arrays (FPGA) to meet the accuracy and latency required by the user. The framework takes the dataset, accuracy and latency requirements from the user and automatically search for candidates to meet the requirements. Case studies and experimental results are presented in this paper to evaluate the effectiveness of our framework for Edge AI applications in real-time image classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Research in Adaptive and Convergent Systems

自引率

0.00%

发文量