RAD-Sim:新型可重构加速装置的快速架构探索

Andrew Boutros, E. Nurvitadhi, Vaughn Betz
{"title":"RAD-Sim:新型可重构加速装置的快速架构探索","authors":"Andrew Boutros, E. Nurvitadhi, Vaughn Betz","doi":"10.1109/FPL57034.2022.00072","DOIUrl":null,"url":null,"abstract":"With the continued growth in field-programmable gate array (FPGA) capacity and their incorporation into new environments such as datacenters, we have witnessed the introduction of a new class of reconfigurable acceleration devices (RADs) that go beyond conventional FPGA architectures. These devices combine a reconfigurable fabric with coarse-grained domain-specialized accelerator blocks all connected via a high-performance packet-switched network-on-chip (NoC) for efficient system-wide communication. However, we lack the tools necessary to efficiently explore the huge design space for RADs, study the complex interactions between their different components and evaluate various combinations of design choices. In this work, we develop RAD-Sim, a cycle-level architecture simulator that allows rapid application-driven exploration of the design space of novel RADs. To showcase the capabilities of RAD-Sim, we map and simulate a state-of-the-art deep learning (DL) inference overlay on a RAD instance incorporating an FPGA fabric and a complex of hard matrix-vector multiplication engines, communicating over a system-wide NoC. Through this example, we show how RAD-Sim can help architects quantify the effect of changing specific architecture parameters on end-to-end application performance.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"RAD-Sim: Rapid Architecture Exploration for Novel Reconfigurable Acceleration Devices\",\"authors\":\"Andrew Boutros, E. Nurvitadhi, Vaughn Betz\",\"doi\":\"10.1109/FPL57034.2022.00072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the continued growth in field-programmable gate array (FPGA) capacity and their incorporation into new environments such as datacenters, we have witnessed the introduction of a new class of reconfigurable acceleration devices (RADs) that go beyond conventional FPGA architectures. These devices combine a reconfigurable fabric with coarse-grained domain-specialized accelerator blocks all connected via a high-performance packet-switched network-on-chip (NoC) for efficient system-wide communication. However, we lack the tools necessary to efficiently explore the huge design space for RADs, study the complex interactions between their different components and evaluate various combinations of design choices. In this work, we develop RAD-Sim, a cycle-level architecture simulator that allows rapid application-driven exploration of the design space of novel RADs. To showcase the capabilities of RAD-Sim, we map and simulate a state-of-the-art deep learning (DL) inference overlay on a RAD instance incorporating an FPGA fabric and a complex of hard matrix-vector multiplication engines, communicating over a system-wide NoC. Through this example, we show how RAD-Sim can help architects quantify the effect of changing specific architecture parameters on end-to-end application performance.\",\"PeriodicalId\":380116,\"journal\":{\"name\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPL57034.2022.00072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

随着现场可编程门阵列(FPGA)容量的持续增长及其与数据中心等新环境的结合,我们见证了一种超越传统FPGA架构的新型可重构加速设备(rad)的引入。这些设备结合了可重构结构和粗粒度的领域专用加速器块,所有这些块都通过高性能分组交换片上网络(NoC)连接,以实现高效的系统范围通信。然而,我们缺乏必要的工具来有效地探索RADs的巨大设计空间,研究其不同组件之间的复杂交互,并评估各种设计选择的组合。在这项工作中,我们开发了RAD-Sim,这是一个循环级架构模拟器,允许对新型rad的设计空间进行快速应用驱动的探索。为了展示RAD- sim的功能,我们在一个RAD实例上映射和模拟了一个最先进的深度学习(DL)推理覆盖层,该实例包含一个FPGA结构和一个硬矩阵向量乘法引擎的复合体,通过系统范围的NoC进行通信。通过这个示例,我们将展示RAD-Sim如何帮助架构师量化更改特定架构参数对端到端应用程序性能的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RAD-Sim: Rapid Architecture Exploration for Novel Reconfigurable Acceleration Devices
With the continued growth in field-programmable gate array (FPGA) capacity and their incorporation into new environments such as datacenters, we have witnessed the introduction of a new class of reconfigurable acceleration devices (RADs) that go beyond conventional FPGA architectures. These devices combine a reconfigurable fabric with coarse-grained domain-specialized accelerator blocks all connected via a high-performance packet-switched network-on-chip (NoC) for efficient system-wide communication. However, we lack the tools necessary to efficiently explore the huge design space for RADs, study the complex interactions between their different components and evaluate various combinations of design choices. In this work, we develop RAD-Sim, a cycle-level architecture simulator that allows rapid application-driven exploration of the design space of novel RADs. To showcase the capabilities of RAD-Sim, we map and simulate a state-of-the-art deep learning (DL) inference overlay on a RAD instance incorporating an FPGA fabric and a complex of hard matrix-vector multiplication engines, communicating over a system-wide NoC. Through this example, we show how RAD-Sim can help architects quantify the effect of changing specific architecture parameters on end-to-end application performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信