参数化人工智能加速器的 FPGA 辅助设计空间探索：快速循环方法

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2024-08-19 DOI:10.1016/j.sysarc.2024.103260

Kashif Inayat , Fahad Bin Muslim , Tayyeb Mahmood , Jaeyong Chung

{"title":"参数化人工智能加速器的 FPGA 辅助设计空间探索：快速循环方法","authors":"Kashif Inayat , Fahad Bin Muslim , Tayyeb Mahmood , Jaeyong Chung","doi":"10.1016/j.sysarc.2024.103260","DOIUrl":null,"url":null,"abstract":"<div><p>FPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially DNN accelerators that unleash an explosive full-stack search space. This paper presents Quickloop, an efficient and scalable framework to enable FPGA-accelerated exploration. Quickloop first abstracts away the cumbersome flow of RTL generation, software stack, FPGA toolflow, workload execution and metrics extraction by wrapping these stages into isolated Quicksteps, featuring cascadability, scalability, and replay. Then, we analytically minimize the FPGA toolflow TAT via a novel, data-driven strategy that intelligently utilizes build fragments from previous iterations, enhancing the loop efficiency and simultaneously lowering the toolflow’s compute utilization.</p><p>Quickloop is built around the OpenAI Gym environment framework and thus supports drop-in regression and reinforcement learning explorations. With a Quickloop around a reference Berkeley’s Gemmini DNN accelerator, we exhaustively explore its parameter space and discover complex performance patterns, based on full-stack simulation of Imagenet benchmarks as a workload. Compared to conventional FPGA toolflow, we further show that Quickloop effectively reduces episodal time by above 30%, as the episode approaches realistic lengths.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"155 ","pages":"Article 103260"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FPGA-assisted Design Space Exploration of Parameterized AI Accelerators: A Quickloop Approach\",\"authors\":\"Kashif Inayat , Fahad Bin Muslim , Tayyeb Mahmood , Jaeyong Chung\",\"doi\":\"10.1016/j.sysarc.2024.103260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>FPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially DNN accelerators that unleash an explosive full-stack search space. This paper presents Quickloop, an efficient and scalable framework to enable FPGA-accelerated exploration. Quickloop first abstracts away the cumbersome flow of RTL generation, software stack, FPGA toolflow, workload execution and metrics extraction by wrapping these stages into isolated Quicksteps, featuring cascadability, scalability, and replay. Then, we analytically minimize the FPGA toolflow TAT via a novel, data-driven strategy that intelligently utilizes build fragments from previous iterations, enhancing the loop efficiency and simultaneously lowering the toolflow’s compute utilization.</p><p>Quickloop is built around the OpenAI Gym environment framework and thus supports drop-in regression and reinforcement learning explorations. With a Quickloop around a reference Berkeley’s Gemmini DNN accelerator, we exhaustively explore its parameter space and discover complex performance patterns, based on full-stack simulation of Imagenet benchmarks as a workload. Compared to conventional FPGA toolflow, we further show that Quickloop effectively reduces episodal time by above 30%, as the episode approaches realistic lengths.</p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"155 \",\"pages\":\"Article 103260\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124001978\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001978","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

FPGA 为原型设计和调试提供了便利，最近还因其快速的周转时间 (TAT) 而加速了全栈仿真。然而，在参数化 RTL 生成器的详尽设计空间探索中，尤其是在释放爆炸性全栈搜索空间的 DNN 加速器中，这种 TAT 具有限制性。本文介绍的 Quickloop 是一种高效、可扩展的框架，用于实现 FPGA 加速探索。Quickloop 首先将 RTL 生成、软件栈、FPGA 工具流、工作负载执行和指标提取等繁琐流程抽象化，将这些阶段封装为独立的 Quicksteps，具有级联性、可扩展性和重放性。然后，我们通过一种新颖的数据驱动策略，智能地利用之前迭代的构建片段，提高循环效率，同时降低工具流的计算利用率，从而最大限度地降低FPGA工具流的TAT。Quickloop以伯克利的Gemmini DNN加速器为参考，以Imagenet基准的全栈模拟为工作负载，详尽地探索了其参数空间，并发现了复杂的性能模式。与传统的 FPGA 工具流相比，我们进一步证明，当情节接近实际长度时，Quickloop 能有效减少 30% 以上的情节时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FPGA-assisted Design Space Exploration of Parameterized AI Accelerators: A Quickloop Approach

FPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially DNN accelerators that unleash an explosive full-stack search space. This paper presents Quickloop, an efficient and scalable framework to enable FPGA-accelerated exploration. Quickloop first abstracts away the cumbersome flow of RTL generation, software stack, FPGA toolflow, workload execution and metrics extraction by wrapping these stages into isolated Quicksteps, featuring cascadability, scalability, and replay. Then, we analytically minimize the FPGA toolflow TAT via a novel, data-driven strategy that intelligently utilizes build fragments from previous iterations, enhancing the loop efficiency and simultaneously lowering the toolflow’s compute utilization.

Quickloop is built around the OpenAI Gym environment framework and thus supports drop-in regression and reinforcement learning explorations. With a Quickloop around a reference Berkeley’s Gemmini DNN accelerator, we exhaustively explore its parameter space and discover complex performance patterns, based on full-stack simulation of Imagenet benchmarks as a workload. Compared to conventional FPGA toolflow, we further show that Quickloop effectively reduces episodal time by above 30%, as the episode approaches realistic lengths.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.