Towards an efficient dataflow-flexible accelerator by finding optimal dataflows of DNNs

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-09-11 DOI:10.1016/j.future.2025.108123

Hyunjun Kim , Whoi Ree Ha , Yongseok Lee , Dongju Lee , Jongwon Lee , Deumji Woo , Jonghee Yoon , Jemin Lee , Yongin Kwon , Yunheung Paek

{"title":"Towards an efficient dataflow-flexible accelerator by finding optimal dataflows of DNNs","authors":"Hyunjun Kim , Whoi Ree Ha , Yongseok Lee , Dongju Lee , Jongwon Lee , Deumji Woo , Jonghee Yoon , Jemin Lee , Yongin Kwon , Yunheung Paek","doi":"10.1016/j.future.2025.108123","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a new dataflow-flexible accelerator design that addresses the limitations of existing heterogeneous dataflow accelerator (HDA) for handling the computation of multiple deep neural network (DNN) models. The design offers increased dataflow flexibility and higher efficiency compared to existing works. The accelerator utilizes a fixed set of representative dataflows implemented as operating modes and switches between them dynamically. A design space exploration (DSE) tool is leveraged to evaluate the efficiency of candidate dataflows and determine the optimal number and types of operating modes. Each layer of the target DNN models is assessed with different operating modes to select the optimal mode for each layer. Also, two supplementary optimization techniques are adopted to reduce the overheads from supporting a multitude of dataflows. One optimizes to minimize the number of transitions of dataflows, which incur severe overheads. The other optimizes to maximize the reuse of hardware components associated with supporting multiple dataflows. By identifying the redundant hardware components, the proposed design minimizes the chip area, another aspect where dataflow-flexible accelerators suffer. Experimental results demonstrate that our algorithm achieves greater dataflow flexibility with high efficiency, Compared to HDA, our design is, on average, 34.6 % lower in latency at the cost of 6.4 % area and negligible energy overhead.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108123"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004170","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a new dataflow-flexible accelerator design that addresses the limitations of existing heterogeneous dataflow accelerator (HDA) for handling the computation of multiple deep neural network (DNN) models. The design offers increased dataflow flexibility and higher efficiency compared to existing works. The accelerator utilizes a fixed set of representative dataflows implemented as operating modes and switches between them dynamically. A design space exploration (DSE) tool is leveraged to evaluate the efficiency of candidate dataflows and determine the optimal number and types of operating modes. Each layer of the target DNN models is assessed with different operating modes to select the optimal mode for each layer. Also, two supplementary optimization techniques are adopted to reduce the overheads from supporting a multitude of dataflows. One optimizes to minimize the number of transitions of dataflows, which incur severe overheads. The other optimizes to maximize the reuse of hardware components associated with supporting multiple dataflows. By identifying the redundant hardware components, the proposed design minimizes the chip area, another aspect where dataflow-flexible accelerators suffer. Experimental results demonstrate that our algorithm achieves greater dataflow flexibility with high efficiency, Compared to HDA, our design is, on average, 34.6 % lower in latency at the cost of 6.4 % area and negligible energy overhead.

查看原文本刊更多论文

通过寻找dnn的最优数据流，实现高效的数据流灵活加速器

针对现有异构数据流加速器（HDA）处理多个深度神经网络（DNN）模型计算的局限性，提出了一种新的数据流灵活加速器设计方案。与现有的工作相比，该设计提供了更大的数据流灵活性和更高的效率。加速器利用一组固定的代表性数据流实现为操作模式，并在它们之间动态切换。利用设计空间探索（DSE）工具来评估候选数据流的效率，并确定操作模式的最佳数量和类型。采用不同的工作模式对目标DNN模型的每一层进行评估，选择每一层的最优模式。此外，还采用了两种补充的优化技术来减少支持大量数据流的开销。一种优化是最小化数据流转换的数量，这会导致严重的开销。另一种优化是为了最大限度地重用与支持多个数据流相关的硬件组件。通过识别冗余硬件组件，所提出的设计将芯片面积最小化，这是数据流灵活加速器遭受的另一个方面。实验结果表明，与HDA相比，我们的算法实现了更大的数据流灵活性和高效率，平均延迟降低34.6%，面积减少6.4%，能量开销可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.