{"title":"BiRD: Bi-Directional Input Reuse Dataflow for Enhancing Depthwise Convolution Performance on Systolic Arrays","authors":"Mingeon Park;Seokjin Hwang;Hyungmin Cho","doi":"10.1109/TC.2024.3449103","DOIUrl":null,"url":null,"abstract":"Depthwise convolution (DWConv) is an effective technique for reducing the size and computational requirements of convolutional neural networks. However, DWConv's input reuse pattern is not easily transformed into dense matrix multiplications, leading to low utilization of processing elements (PEs) on existing systolic arrays. In this paper, we introduce a novel systolic array dataflow mechanism called \n<i>BiRD</i>\n, designed to maximize input reuse and boost DWConv performance. BiRD utilizes two directions of input reuse and necessitates only minor modifications to a typical weight-stationary type systolic array. We evaluate BiRD on the Gemmini platform, comparing it with existing dataflow types. The results demonstrate that BiRD achieves significant performance improvements in computation time reduction, while incurring minimal area overhead and improved energy consumption compared to other dataflow types. For example, on a 32\n<inline-formula><tex-math>$\\times{}$</tex-math></inline-formula>\n32 systolic array, it results in a 9.8% area overhead, significantly smaller than other dataflow types for DWConv. Compared to matrix multiplication-based DWConv, BiRD achieves a 4.7\n<inline-formula><tex-math>$\\times{}$</tex-math></inline-formula>\n performance improvement for DWConv layers of MobileNet-V2, resulting in a 55.8% reduction in total inference computation time and a 44.9% reduction in energy consumption. Our results highlight the effectiveness of BiRD in enhancing the performance of DWConv on systolic arrays.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2708-2721"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10644120/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Depthwise convolution (DWConv) is an effective technique for reducing the size and computational requirements of convolutional neural networks. However, DWConv's input reuse pattern is not easily transformed into dense matrix multiplications, leading to low utilization of processing elements (PEs) on existing systolic arrays. In this paper, we introduce a novel systolic array dataflow mechanism called
BiRD
, designed to maximize input reuse and boost DWConv performance. BiRD utilizes two directions of input reuse and necessitates only minor modifications to a typical weight-stationary type systolic array. We evaluate BiRD on the Gemmini platform, comparing it with existing dataflow types. The results demonstrate that BiRD achieves significant performance improvements in computation time reduction, while incurring minimal area overhead and improved energy consumption compared to other dataflow types. For example, on a 32
$\times{}$
32 systolic array, it results in a 9.8% area overhead, significantly smaller than other dataflow types for DWConv. Compared to matrix multiplication-based DWConv, BiRD achieves a 4.7
$\times{}$
performance improvement for DWConv layers of MobileNet-V2, resulting in a 55.8% reduction in total inference computation time and a 44.9% reduction in energy consumption. Our results highlight the effectiveness of BiRD in enhancing the performance of DWConv on systolic arrays.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.