HS-GCN：一种高性能、可持续、可扩展的基于芯片的图卷积网络推理加速器

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Sustainable Computing Pub Date : 2025-06-02 DOI:10.1109/TSUSC.2025.3575285

Yingnan Zhao;Ke Wang;Ahmed Louri

{"title":"HS-GCN：一种高性能、可持续、可扩展的基于芯片的图卷积网络推理加速器","authors":"Yingnan Zhao;Ke Wang;Ahmed Louri","doi":"10.1109/TSUSC.2025.3575285","DOIUrl":null,"url":null,"abstract":"Graph Convolutional Networks (GCNs) have been proposed to extend machine learning techniques for graph-related applications. A typical GCN model consists of multiple layers, each including an aggregation phase, which is communication-intensive, and a combination phase, which is computation-intensive. As the size of real-world graphs increases exponentially, current customized accelerators face challenges in efficiently performing GCN inference due to limited on-chip buffers and other hardware resources for both data computation and communication, which degrades performance and energy efficiency. Additionally, scaling current monolithic designs to address the aforementioned challenges will introduce significant cost-effectiveness issues in terms of power, area, and yield. To this end, we propose HS-GCN, a high-performance, sustainable, and scalable chiplet-based accelerator for GCN inference with much-improved energy efficiency. Specifically, HS-GCN integrates multiple reconfigurable chiplets, each of which can be configured to perform the main computations of either the aggregation phase or the combination phase, including Sparse-dense matrix multiplication (SpMM) and General matrix-matrix multiplication (GeMM). HS-GCN implements an active interposer with a flexible interconnection fabric to connect chiplets and other hardware components for efficient data communication. Additionally, HS-GCN introduces two system-level control algorithms that dynamically determine the computation order and corresponding dataflow based on the input graphs and GCN models. These selections are used to further configure the chiplet array and interconnection fabric for much-improved performance and energy efficiency. Evaluation results using real-world graphs demonstrate that HS-GCN achieves significant speedups of 26.7×, 11.2×, 3.9×, 4.7×, 3.1×, along with substantial memory access savings of 94%, 89%, 64%, 85%, 54%, and energy savings of 87%, 84%, 49%, 78%, 41% on average, as compared to HyGCN, AWB-GCN, GCNAX, I-GCN, and SGCN, respectively.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"10 5","pages":"1019-1030"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HS-GCN: A High-Performance, Sustainable, and Scalable Chiplet-Based Accelerator for Graph Convolutional Network Inference\",\"authors\":\"Yingnan Zhao;Ke Wang;Ahmed Louri\",\"doi\":\"10.1109/TSUSC.2025.3575285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Convolutional Networks (GCNs) have been proposed to extend machine learning techniques for graph-related applications. A typical GCN model consists of multiple layers, each including an aggregation phase, which is communication-intensive, and a combination phase, which is computation-intensive. As the size of real-world graphs increases exponentially, current customized accelerators face challenges in efficiently performing GCN inference due to limited on-chip buffers and other hardware resources for both data computation and communication, which degrades performance and energy efficiency. Additionally, scaling current monolithic designs to address the aforementioned challenges will introduce significant cost-effectiveness issues in terms of power, area, and yield. To this end, we propose HS-GCN, a high-performance, sustainable, and scalable chiplet-based accelerator for GCN inference with much-improved energy efficiency. Specifically, HS-GCN integrates multiple reconfigurable chiplets, each of which can be configured to perform the main computations of either the aggregation phase or the combination phase, including Sparse-dense matrix multiplication (SpMM) and General matrix-matrix multiplication (GeMM). HS-GCN implements an active interposer with a flexible interconnection fabric to connect chiplets and other hardware components for efficient data communication. Additionally, HS-GCN introduces two system-level control algorithms that dynamically determine the computation order and corresponding dataflow based on the input graphs and GCN models. These selections are used to further configure the chiplet array and interconnection fabric for much-improved performance and energy efficiency. Evaluation results using real-world graphs demonstrate that HS-GCN achieves significant speedups of 26.7×, 11.2×, 3.9×, 4.7×, 3.1×, along with substantial memory access savings of 94%, 89%, 64%, 85%, 54%, and energy savings of 87%, 84%, 49%, 78%, 41% on average, as compared to HyGCN, AWB-GCN, GCNAX, I-GCN, and SGCN, respectively.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"10 5\",\"pages\":\"1019-1030\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11018459/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11018459/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

图卷积网络（GCNs）被提出用于扩展与图相关的应用的机器学习技术。典型的GCN模型由多层组成，每层都包括一个通信密集型的聚合阶段和一个计算密集型的组合阶段。随着现实世界图形的大小呈指数级增长，由于芯片上的缓冲区和其他用于数据计算和通信的硬件资源有限，当前的定制加速器在有效执行GCN推理方面面临挑战，这会降低性能和能源效率。此外，扩展当前的单片设计以解决上述挑战将在功率，面积和良率方面引入重大的成本效益问题。为此，我们提出了HS-GCN，这是一种高性能，可持续和可扩展的基于芯片的GCN推理加速器，具有大大提高的能源效率。具体来说，HS-GCN集成了多个可重构小芯片，每个小芯片都可以配置为执行聚合阶段或组合阶段的主要计算，包括稀疏密集矩阵乘法（SpMM）和通用矩阵矩阵乘法（GeMM）。HS-GCN实现了一个具有灵活互连结构的有源中介器，用于连接小芯片和其他硬件组件，以实现高效的数据通信。此外，HS-GCN引入了两种系统级控制算法，根据输入图和GCN模型动态确定计算顺序和相应的数据流。这些选择用于进一步配置芯片阵列和互连结构，以大大提高性能和能源效率。使用真实图形的评估结果表明，与HyGCN、AWB-GCN、GCNAX、I-GCN和SGCN相比，HS-GCN实现了26.7×、11.2×、3.9×、4.7×、3.1×的显著速度提升，内存访问节省了94%、89%、64%、85%、54%，平均节能了87%、84%、49%、78%、41%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HS-GCN: A High-Performance, Sustainable, and Scalable Chiplet-Based Accelerator for Graph Convolutional Network Inference

Graph Convolutional Networks (GCNs) have been proposed to extend machine learning techniques for graph-related applications. A typical GCN model consists of multiple layers, each including an aggregation phase, which is communication-intensive, and a combination phase, which is computation-intensive. As the size of real-world graphs increases exponentially, current customized accelerators face challenges in efficiently performing GCN inference due to limited on-chip buffers and other hardware resources for both data computation and communication, which degrades performance and energy efficiency. Additionally, scaling current monolithic designs to address the aforementioned challenges will introduce significant cost-effectiveness issues in terms of power, area, and yield. To this end, we propose HS-GCN, a high-performance, sustainable, and scalable chiplet-based accelerator for GCN inference with much-improved energy efficiency. Specifically, HS-GCN integrates multiple reconfigurable chiplets, each of which can be configured to perform the main computations of either the aggregation phase or the combination phase, including Sparse-dense matrix multiplication (SpMM) and General matrix-matrix multiplication (GeMM). HS-GCN implements an active interposer with a flexible interconnection fabric to connect chiplets and other hardware components for efficient data communication. Additionally, HS-GCN introduces two system-level control algorithms that dynamically determine the computation order and corresponding dataflow based on the input graphs and GCN models. These selections are used to further configure the chiplet array and interconnection fabric for much-improved performance and energy efficiency. Evaluation results using real-world graphs demonstrate that HS-GCN achieves significant speedups of 26.7×, 11.2×, 3.9×, 4.7×, 3.1×, along with substantial memory access savings of 94%, 89%, 64%, 85%, 54%, and energy savings of 87%, 84%, 49%, 78%, 41% on average, as compared to HyGCN, AWB-GCN, GCNAX, I-GCN, and SGCN, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization

CiteScore

7.70

自引率

2.60%

发文量