{"title":"HS-GCN:一种高性能、可持续、可扩展的基于芯片的图卷积网络推理加速器","authors":"Yingnan Zhao;Ke Wang;Ahmed Louri","doi":"10.1109/TSUSC.2025.3575285","DOIUrl":null,"url":null,"abstract":"Graph Convolutional Networks (GCNs) have been proposed to extend machine learning techniques for graph-related applications. A typical GCN model consists of multiple layers, each including an aggregation phase, which is communication-intensive, and a combination phase, which is computation-intensive. As the size of real-world graphs increases exponentially, current customized accelerators face challenges in efficiently performing GCN inference due to limited on-chip buffers and other hardware resources for both data computation and communication, which degrades performance and energy efficiency. Additionally, scaling current monolithic designs to address the aforementioned challenges will introduce significant cost-effectiveness issues in terms of power, area, and yield. To this end, we propose HS-GCN, a high-performance, sustainable, and scalable chiplet-based accelerator for GCN inference with much-improved energy efficiency. Specifically, HS-GCN integrates multiple reconfigurable chiplets, each of which can be configured to perform the main computations of either the aggregation phase or the combination phase, including Sparse-dense matrix multiplication (SpMM) and General matrix-matrix multiplication (GeMM). HS-GCN implements an active interposer with a flexible interconnection fabric to connect chiplets and other hardware components for efficient data communication. Additionally, HS-GCN introduces two system-level control algorithms that dynamically determine the computation order and corresponding dataflow based on the input graphs and GCN models. These selections are used to further configure the chiplet array and interconnection fabric for much-improved performance and energy efficiency. Evaluation results using real-world graphs demonstrate that HS-GCN achieves significant speedups of 26.7×, 11.2×, 3.9×, 4.7×, 3.1×, along with substantial memory access savings of 94%, 89%, 64%, 85%, 54%, and energy savings of 87%, 84%, 49%, 78%, 41% on average, as compared to HyGCN, AWB-GCN, GCNAX, I-GCN, and SGCN, respectively.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"10 5","pages":"1019-1030"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HS-GCN: A High-Performance, Sustainable, and Scalable Chiplet-Based Accelerator for Graph Convolutional Network Inference\",\"authors\":\"Yingnan Zhao;Ke Wang;Ahmed Louri\",\"doi\":\"10.1109/TSUSC.2025.3575285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Convolutional Networks (GCNs) have been proposed to extend machine learning techniques for graph-related applications. A typical GCN model consists of multiple layers, each including an aggregation phase, which is communication-intensive, and a combination phase, which is computation-intensive. As the size of real-world graphs increases exponentially, current customized accelerators face challenges in efficiently performing GCN inference due to limited on-chip buffers and other hardware resources for both data computation and communication, which degrades performance and energy efficiency. Additionally, scaling current monolithic designs to address the aforementioned challenges will introduce significant cost-effectiveness issues in terms of power, area, and yield. To this end, we propose HS-GCN, a high-performance, sustainable, and scalable chiplet-based accelerator for GCN inference with much-improved energy efficiency. Specifically, HS-GCN integrates multiple reconfigurable chiplets, each of which can be configured to perform the main computations of either the aggregation phase or the combination phase, including Sparse-dense matrix multiplication (SpMM) and General matrix-matrix multiplication (GeMM). HS-GCN implements an active interposer with a flexible interconnection fabric to connect chiplets and other hardware components for efficient data communication. Additionally, HS-GCN introduces two system-level control algorithms that dynamically determine the computation order and corresponding dataflow based on the input graphs and GCN models. These selections are used to further configure the chiplet array and interconnection fabric for much-improved performance and energy efficiency. Evaluation results using real-world graphs demonstrate that HS-GCN achieves significant speedups of 26.7×, 11.2×, 3.9×, 4.7×, 3.1×, along with substantial memory access savings of 94%, 89%, 64%, 85%, 54%, and energy savings of 87%, 84%, 49%, 78%, 41% on average, as compared to HyGCN, AWB-GCN, GCNAX, I-GCN, and SGCN, respectively.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"10 5\",\"pages\":\"1019-1030\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11018459/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11018459/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
HS-GCN: A High-Performance, Sustainable, and Scalable Chiplet-Based Accelerator for Graph Convolutional Network Inference
Graph Convolutional Networks (GCNs) have been proposed to extend machine learning techniques for graph-related applications. A typical GCN model consists of multiple layers, each including an aggregation phase, which is communication-intensive, and a combination phase, which is computation-intensive. As the size of real-world graphs increases exponentially, current customized accelerators face challenges in efficiently performing GCN inference due to limited on-chip buffers and other hardware resources for both data computation and communication, which degrades performance and energy efficiency. Additionally, scaling current monolithic designs to address the aforementioned challenges will introduce significant cost-effectiveness issues in terms of power, area, and yield. To this end, we propose HS-GCN, a high-performance, sustainable, and scalable chiplet-based accelerator for GCN inference with much-improved energy efficiency. Specifically, HS-GCN integrates multiple reconfigurable chiplets, each of which can be configured to perform the main computations of either the aggregation phase or the combination phase, including Sparse-dense matrix multiplication (SpMM) and General matrix-matrix multiplication (GeMM). HS-GCN implements an active interposer with a flexible interconnection fabric to connect chiplets and other hardware components for efficient data communication. Additionally, HS-GCN introduces two system-level control algorithms that dynamically determine the computation order and corresponding dataflow based on the input graphs and GCN models. These selections are used to further configure the chiplet array and interconnection fabric for much-improved performance and energy efficiency. Evaluation results using real-world graphs demonstrate that HS-GCN achieves significant speedups of 26.7×, 11.2×, 3.9×, 4.7×, 3.1×, along with substantial memory access savings of 94%, 89%, 64%, 85%, 54%, and energy savings of 87%, 84%, 49%, 78%, 41% on average, as compared to HyGCN, AWB-GCN, GCNAX, I-GCN, and SGCN, respectively.