用于变压器推理加速和优化的 FPGA 和 ASIC 设计概览

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2024-08-07 DOI:10.1016/j.sysarc.2024.103247

Beom Jin Kang , Hae In Lee , Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim

{"title":"用于变压器推理加速和优化的 FPGA 和 ASIC 设计概览","authors":"Beom Jin Kang , Hae In Lee , Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim","doi":"10.1016/j.sysarc.2024.103247","DOIUrl":null,"url":null,"abstract":"<div><p>Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (<em>e.g.</em>, recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103247"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A survey of FPGA and ASIC designs for transformer inference acceleration and optimization\",\"authors\":\"Beom Jin Kang , Hae In Lee , Seok Kyu Yoon, Young Chan Kim, Sang Beom Jeong, Seong Jun O, Hyun Kim\",\"doi\":\"10.1016/j.sysarc.2024.103247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (<em>e.g.</em>, recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.</p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"155 \",\"pages\":\"Article 103247\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S138376212400184X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S138376212400184X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

最近，基于变压器的模型在计算机视觉、语音识别和自然语言处理等多个领域取得了显著的成功。然而，与传统神经网络（递归神经网络、长短期记忆和卷积神经网络）相比，变换器模型需要更多的参数和计算操作。变压器模型通常在图形处理器（GPU）平台上进行处理，该平台专门用于高性能内存和并行处理。然而，GPU 的高功耗给其在电池容量有限的边缘设备环境中的部署带来了巨大挑战。为了解决这些问题，利用现场可编程门阵列（FPGA）和特定应用集成电路（ASIC）驱动低功耗变压器模型的研究正在进行中。FPGA 具有高度灵活性，而 ASIC 则有利于优化吞吐量和功耗。因此，这两种平台都非常适合高效优化矩阵乘法运算，而矩阵乘法运算在变压器模型中占很大比重。此外，FPGA 和 ASIC 的功耗低于 GPU，是理想的节能平台。本研究调查并分析了与基于 FPGA 和 ASIC 的变压器设计相关的模型压缩方法、各种优化技术和加速器架构。我们希望本研究能为变压器领域的硬件研究提供有价值的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A survey of FPGA and ASIC designs for transformer inference acceleration and optimization

Recently, transformer-based models have achieved remarkable success in various fields, such as computer vision, speech recognition, and natural language processing. However, transformer models require a substantially higher number of parameters and computational operations than conventional neural networks (e.g., recurrent neural networks, long-short-term memory, and convolutional neural networks). Transformer models are typically processed on graphics processing unit (GPU) platforms specialized for high-performance memory and parallel processing. However, the high power consumption of GPUs poses significant challenges for their deployment in edge device environments with limited battery capacity. To address these issues, research on using field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to drive transformer models with low power consumption is underway. FPGAs offer a high level of flexibility, whereas ASICs are beneficial for optimizing throughput and power. Therefore, both platforms are highly suitable for efficiently optimizing matrix multiplication operations, constituting a significant portion of transformer models. In addition, FPGAs and ASICs consume less power than GPUs, making them ideal energy-efficient platforms. This study investigates and analyzes the model compression methods, various optimization techniques, and architectures of accelerators related to FPGA- and ASIC-based transformer designs. We expect this study to serve as a valuable guide for hardware research in the transformer field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.