CHEF:在配备异构 FPGA 的集群上部署异构模型的框架

IF 2.7 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Yue Tang;Yukai Song;Naveena Elango;Sheena Ratnam Priya;Alex K. Jones;Jinjun Xiong;Peipei Zhou;Jingtong Hu
{"title":"CHEF:在配备异构 FPGA 的集群上部署异构模型的框架","authors":"Yue Tang;Yukai Song;Naveena Elango;Sheena Ratnam Priya;Alex K. Jones;Jinjun Xiong;Peipei Zhou;Jingtong Hu","doi":"10.1109/TCAD.2024.3438994","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are rapidly evolving from streamlined single-modality single-task (SMST) to multimodality multitask (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high-computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e., deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10\n<inline-formula> <tex-math>$000\\times $ </tex-math></inline-formula>\n less than exhaustively searching the optimal solution.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3937-3948"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs\",\"authors\":\"Yue Tang;Yukai Song;Naveena Elango;Sheena Ratnam Priya;Alex K. Jones;Jinjun Xiong;Peipei Zhou;Jingtong Hu\",\"doi\":\"10.1109/TCAD.2024.3438994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are rapidly evolving from streamlined single-modality single-task (SMST) to multimodality multitask (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high-computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e., deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10\\n<inline-formula> <tex-math>$000\\\\times $ </tex-math></inline-formula>\\n less than exhaustively searching the optimal solution.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3937-3948\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745763/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745763/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

深度神经网络(DNN)正在从精简的单模态单任务(SMST)快速发展到多模态多任务(MMMT),不同层之间的差异很大,层与层之间的数据依赖关系也很复杂。为了支持这种模式,硬件系统也演变为异构系统。异构系统源于将各种加速器集成到系统中以降低延迟的流行趋势。FPGA 具有高计算密度和通信带宽,可配置不同设计的加速器,广泛应用于各种机器学习应用。然而,在异构 FPGA 上从 SMST 扩展到 MMMT 是一项挑战,因为 MMMT 的层变化更大,层数更多,而且不同骨干网之间存在复杂的数据依赖关系。以往的映射算法要么效率低下,要么过于简化,因此在一般情况下不切实际。在这项工作中,我们提出了 CHEF,以便在现实的异构 FPGA 集群中高效实现 MMMT 模型,即在异构 FPGA 上部署异构加速器(A2F),并在部署的异构加速器上映射异构 DNN(M2A)。我们提出了 CHEF-A2F,一种两阶段加速器到 FPGA 的部署方法,以共同优化硬件部署和加速器映射。此外,我们还提出了 CHEF-M2A,与之前的映射算法相比,它可以支持一般的实用案例。据我们所知,这是首次尝试在实际异构 FPGA 集群中实施 MMMT 模型。实验结果表明,使用 CHEF 得到的延迟接近最优,而搜索时间则比穷举搜索最优解少 10000 次。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs
Deep neural networks (DNNs) are rapidly evolving from streamlined single-modality single-task (SMST) to multimodality multitask (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high-computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e., deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10 $000\times $ less than exhaustively searching the optimal solution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信