CHEF：在配备异构 FPGA 的集群上部署异构模型的框架

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3438994

Yue Tang;Yukai Song;Naveena Elango;Sheena Ratnam Priya;Alex K. Jones;Jinjun Xiong;Peipei Zhou;Jingtong Hu

{"title":"CHEF：在配备异构 FPGA 的集群上部署异构模型的框架","authors":"Yue Tang;Yukai Song;Naveena Elango;Sheena Ratnam Priya;Alex K. Jones;Jinjun Xiong;Peipei Zhou;Jingtong Hu","doi":"10.1109/TCAD.2024.3438994","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are rapidly evolving from streamlined single-modality single-task (SMST) to multimodality multitask (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high-computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e., deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10\n<inline-formula> <tex-math>$000\\times $ </tex-math></inline-formula>\n less than exhaustively searching the optimal solution.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3937-3948"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs\",\"authors\":\"Yue Tang;Yukai Song;Naveena Elango;Sheena Ratnam Priya;Alex K. Jones;Jinjun Xiong;Peipei Zhou;Jingtong Hu\",\"doi\":\"10.1109/TCAD.2024.3438994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are rapidly evolving from streamlined single-modality single-task (SMST) to multimodality multitask (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high-computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e., deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10\\n<inline-formula> <tex-math>$000\\\\times $ </tex-math></inline-formula>\\n less than exhaustively searching the optimal solution.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3937-3948\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745763/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745763/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络（DNN）正在从精简的单模态单任务（SMST）快速发展到多模态多任务（MMMT），不同层之间的差异很大，层与层之间的数据依赖关系也很复杂。为了支持这种模式，硬件系统也演变为异构系统。异构系统源于将各种加速器集成到系统中以降低延迟的流行趋势。FPGA 具有高计算密度和通信带宽，可配置不同设计的加速器，广泛应用于各种机器学习应用。然而，在异构 FPGA 上从 SMST 扩展到 MMMT 是一项挑战，因为 MMMT 的层变化更大，层数更多，而且不同骨干网之间存在复杂的数据依赖关系。以往的映射算法要么效率低下，要么过于简化，因此在一般情况下不切实际。在这项工作中，我们提出了 CHEF，以便在现实的异构 FPGA 集群中高效实现 MMMT 模型，即在异构 FPGA 上部署异构加速器（A2F），并在部署的异构加速器上映射异构 DNN（M2A）。我们提出了 CHEF-A2F，一种两阶段加速器到 FPGA 的部署方法，以共同优化硬件部署和加速器映射。此外，我们还提出了 CHEF-M2A，与之前的映射算法相比，它可以支持一般的实用案例。据我们所知，这是首次尝试在实际异构 FPGA 集群中实施 MMMT 模型。实验结果表明，使用 CHEF 得到的延迟接近最优，而搜索时间则比穷举搜索最优解少 10000 次。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CHEF: A Framework for Deploying Heterogeneous Models on Clusters With Heterogeneous FPGAs

Deep neural networks (DNNs) are rapidly evolving from streamlined single-modality single-task (SMST) to multimodality multitask (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high-computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e., deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10

$000\times $

less than exhaustively searching the optimal solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.