Automated Hardware and Neural Network Architecture co-design of FPGA accelerators using multi-objective Neural Architecture Search

2020 IEEE 10th International Conference on Consumer Electronics (ICCE-Berlin) Pub Date : 2020-11-09 DOI:10.1109/ICCE-Berlin50680.2020.9352153

Philip Colangelo, Oren Segal, Alexander Speicher, M. Margala

{"title":"Automated Hardware and Neural Network Architecture co-design of FPGA accelerators using multi-objective Neural Architecture Search","authors":"Philip Colangelo, Oren Segal, Alexander Speicher, M. Margala","doi":"10.1109/ICCE-Berlin50680.2020.9352153","DOIUrl":null,"url":null,"abstract":"State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutoML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research in the auto-design of NNAs has focused on convolution networks and image recognition, ignoring the fact that a significant part of the workload in data centers is general-purpose deep neural networks. In this work, we develop and test a general multilayer perceptron (MLP) flow that can take arbitrary datasets as input and automatically produce optimized NNAs and hardware designs. We test the flow on six benchmarks. Our results show we exceed the performance of currently published MLP accuracy results and are competitive with non-MLP based results. We compare general and common GPU architectures with our scalable FPGA design and show we can achieve higher efficiency and higher throughput (outputs per second) for the majority of datasets. Further insights into the design space for both accurate networks and high performing hardware shows the power of co-design by correlating accuracy versus throughput, network size versus accuracy, and scaling to high-performance devices.","PeriodicalId":438631,"journal":{"name":"2020 IEEE 10th International Conference on Consumer Electronics (ICCE-Berlin)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on Consumer Electronics (ICCE-Berlin)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-Berlin50680.2020.9352153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

State-of-the-art Neural Network Architectures (NNAs) are challenging to design and implement efficiently in hardware. In the past couple of years, this has led to an explosion in research and development of automatic Neural Architecture Search (NAS) tools. AutoML tools are now used to achieve state of the art NNA designs and attempt to optimize for hardware usage and design. Much of the recent research in the auto-design of NNAs has focused on convolution networks and image recognition, ignoring the fact that a significant part of the workload in data centers is general-purpose deep neural networks. In this work, we develop and test a general multilayer perceptron (MLP) flow that can take arbitrary datasets as input and automatically produce optimized NNAs and hardware designs. We test the flow on six benchmarks. Our results show we exceed the performance of currently published MLP accuracy results and are competitive with non-MLP based results. We compare general and common GPU architectures with our scalable FPGA design and show we can achieve higher efficiency and higher throughput (outputs per second) for the majority of datasets. Further insights into the design space for both accurate networks and high performing hardware shows the power of co-design by correlating accuracy versus throughput, network size versus accuracy, and scaling to high-performance devices.

查看原文本刊更多论文

基于多目标神经网络结构搜索的FPGA加速器自动化硬件和神经网络结构协同设计

最先进的神经网络架构(NNAs)在硬件上的高效设计和实现具有挑战性。在过去的几年中，这导致了自动神经结构搜索(NAS)工具的研究和开发的爆炸式增长。自动化工具现在用于实现最先进的NNA设计，并尝试优化硬件使用和设计。最近关于NNAs自动设计的许多研究都集中在卷积网络和图像识别上，忽略了数据中心中很大一部分工作负载是通用深度神经网络这一事实。在这项工作中，我们开发并测试了一个通用多层感知器(MLP)流程，该流程可以将任意数据集作为输入，并自动生成优化的NNAs和硬件设计。我们在六个基准上测试该流。我们的结果表明，我们超过了目前公布的MLP精度结果的性能，并且与非基于MLP的结果具有竞争力。我们将通用和通用GPU架构与我们的可扩展FPGA设计进行比较，并表明我们可以为大多数数据集实现更高的效率和更高的吞吐量(每秒输出)。对精确网络和高性能硬件的设计空间的进一步了解表明，通过将精度与吞吐量、网络大小与精度以及扩展到高性能设备，协同设计具有强大的功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 10th International Conference on Consumer Electronics (ICCE-Berlin)

自引率

0.00%

发文量