Hyunjun Kim , Whoi Ree Ha , Yongseok Lee , Dongju Lee , Jongwon Lee , Deumji Woo , Jonghee Yoon , Jemin Lee , Yongin Kwon , Yunheung Paek
{"title":"Towards an efficient dataflow-flexible accelerator by finding optimal dataflows of DNNs","authors":"Hyunjun Kim , Whoi Ree Ha , Yongseok Lee , Dongju Lee , Jongwon Lee , Deumji Woo , Jonghee Yoon , Jemin Lee , Yongin Kwon , Yunheung Paek","doi":"10.1016/j.future.2025.108123","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a new dataflow-flexible accelerator design that addresses the limitations of existing heterogeneous dataflow accelerator (HDA) for handling the computation of multiple deep neural network (DNN) models. The design offers increased dataflow flexibility and higher efficiency compared to existing works. The accelerator utilizes a fixed set of representative dataflows implemented as operating modes and switches between them dynamically. A design space exploration (DSE) tool is leveraged to evaluate the efficiency of candidate dataflows and determine the optimal number and types of operating modes. Each layer of the target DNN models is assessed with different operating modes to select the optimal mode for each layer. Also, two supplementary optimization techniques are adopted to reduce the overheads from supporting a multitude of dataflows. One optimizes to minimize the number of transitions of dataflows, which incur severe overheads. The other optimizes to maximize the reuse of hardware components associated with supporting multiple dataflows. By identifying the redundant hardware components, the proposed design minimizes the chip area, another aspect where dataflow-flexible accelerators suffer. Experimental results demonstrate that our algorithm achieves greater dataflow flexibility with high efficiency, Compared to HDA, our design is, on average, 34.6 % lower in latency at the cost of 6.4 % area and negligible energy overhead.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108123"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004170","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a new dataflow-flexible accelerator design that addresses the limitations of existing heterogeneous dataflow accelerator (HDA) for handling the computation of multiple deep neural network (DNN) models. The design offers increased dataflow flexibility and higher efficiency compared to existing works. The accelerator utilizes a fixed set of representative dataflows implemented as operating modes and switches between them dynamically. A design space exploration (DSE) tool is leveraged to evaluate the efficiency of candidate dataflows and determine the optimal number and types of operating modes. Each layer of the target DNN models is assessed with different operating modes to select the optimal mode for each layer. Also, two supplementary optimization techniques are adopted to reduce the overheads from supporting a multitude of dataflows. One optimizes to minimize the number of transitions of dataflows, which incur severe overheads. The other optimizes to maximize the reuse of hardware components associated with supporting multiple dataflows. By identifying the redundant hardware components, the proposed design minimizes the chip area, another aspect where dataflow-flexible accelerators suffer. Experimental results demonstrate that our algorithm achieves greater dataflow flexibility with high efficiency, Compared to HDA, our design is, on average, 34.6 % lower in latency at the cost of 6.4 % area and negligible energy overhead.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.