TRAC: Compilation-Based Design of Transformer Accelerators for FPGAs

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00015

Patrick Plagwitz, Frank Hannig, Jürgen Teich

{"title":"TRAC: Compilation-Based Design of Transformer Accelerators for FPGAs","authors":"Patrick Plagwitz, Frank Hannig, Jürgen Teich","doi":"10.1109/FPL57034.2022.00015","DOIUrl":null,"url":null,"abstract":"Transformer-type Neural Networks (NNs) have shown impressive accuracy numbers in Natural Language Processing (NLP) applications where Recurrent Neural Networks (RNNs) have been in use before, even surpassing them. However, differing considerably from common types of NNs, existing accelerator designs, particularly for Field-Programmable Gate Arrays (FPGAs), cannot be used to implement them. Previous research has shown FPGAs to be platforms superior to CPUs and even GPUs for accelerating NNs when it comes to energy efficiency. Following the development of automated compiler-based design flows for NNs, there is still a lack of such an approach for transformers and FPGA targets. In this realm, this paper presents a novel compiler called TRAC as well as a library of operators and modules for implementing transformer accelerators on FPGAs. Based on optimization and code generation settings in the compiler using an integrated approach combining weight compression techniques with according adaptations of the accelerator modules, a design space of accelerators is defined and explored. For each design, a system-level data path and control unit architecture is generated, which integrates module-level designs using hierarchical High-Level Synthesis (HLS). We evaluate our implementation for the BERT network and provide results regarding the trade-off between execution time, accuracy, and FPGA resource usage.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"41 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Transformer-type Neural Networks (NNs) have shown impressive accuracy numbers in Natural Language Processing (NLP) applications where Recurrent Neural Networks (RNNs) have been in use before, even surpassing them. However, differing considerably from common types of NNs, existing accelerator designs, particularly for Field-Programmable Gate Arrays (FPGAs), cannot be used to implement them. Previous research has shown FPGAs to be platforms superior to CPUs and even GPUs for accelerating NNs when it comes to energy efficiency. Following the development of automated compiler-based design flows for NNs, there is still a lack of such an approach for transformers and FPGA targets. In this realm, this paper presents a novel compiler called TRAC as well as a library of operators and modules for implementing transformer accelerators on FPGAs. Based on optimization and code generation settings in the compiler using an integrated approach combining weight compression techniques with according adaptations of the accelerator modules, a design space of accelerators is defined and explored. For each design, a system-level data path and control unit architecture is generated, which integrates module-level designs using hierarchical High-Level Synthesis (HLS). We evaluate our implementation for the BERT network and provide results regarding the trade-off between execution time, accuracy, and FPGA resource usage.

查看原文本刊更多论文

基于编译的fpga变压器加速器设计

变压器型神经网络(nn)在自然语言处理(NLP)应用中显示出令人印象深刻的精度数字，而递归神经网络(rnn)之前已经在使用，甚至超过了它们。然而，与普通类型的神经网络有很大不同，现有的加速器设计，特别是现场可编程门阵列(fpga)，不能用于实现它们。先前的研究表明，在加速神经网络方面，fpga是优于cpu甚至gpu的平台。随着基于自动编译器的神经网络设计流程的发展，变压器和FPGA目标仍然缺乏这样的方法。在这一领域，本文提出了一种新的编译器TRAC，以及一个用于在fpga上实现变压器加速器的运算符和模块库。基于编译器中的优化和代码生成设置，采用权重压缩技术和加速器模块的自适应相结合的集成方法，定义和探索了加速器的设计空间。对于每个设计，生成一个系统级数据路径和控制单元架构，它使用分层高级综合(HLS)集成模块级设计。我们评估了BERT网络的实现，并提供了关于执行时间、准确性和FPGA资源使用之间权衡的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量