ChainPIM: A ReRAM-Based Processing-in-Memory Accelerator for HGNNs via Chain Structure

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-01-13 DOI:10.1109/TCAD.2025.3528906

Wenjing Xiao;Jianyu Wang;Dan Chen;Chenglong Shi;Xin Ling;Min Chen;Thomas Wu

{"title":"ChainPIM: A ReRAM-Based Processing-in-Memory Accelerator for HGNNs via Chain Structure","authors":"Wenjing Xiao;Jianyu Wang;Dan Chen;Chenglong Shi;Xin Ling;Min Chen;Thomas Wu","doi":"10.1109/TCAD.2025.3528906","DOIUrl":null,"url":null,"abstract":"Heterogeneous graph neural networks (HGNNs) have recently demonstrated significant advantages of capturing powerful structural and semantic information in heterogeneous graphs. Different from homogeneous graph neural networks directly aggregating information based on neighbors, HGNNs aggregate information based on complex metapaths. ReRAM-based processing-in-memory (PIM) architecture can reduce data movement and compute matrix-vector multiplication (MVM) in analog. It can be well used to accelerate HGNNs. However, the complex metapath-based aggregation of HGNNs makes it challenging to efficiently utilize the parallelism of ReRAM and vertices data reuse. To this end, we propose ChainPIM, the first ReRAM-based processing-in-memory accelerator for HGNNs featuring high-computing parallelism and vertices data reuse. Specifically, we introduce R-chain, which is based on a chain structure to build related metapath instances together. We can efficiently reuse vertices through R-chain and process different R-chains in parallel. Then, we further design an efficient storage format for storing R-chains, which reduces a lot of repeated vertices storage. Finally, a specialized ReRAM-based architecture is developed to pipeline different types of aggregations in HGNNs, fully exploiting the huge potential of multilevel parallelism in HGNNs. Our experiments show that ChainPIM achieves an average memory space reduction of 47.86% and performance improvement by <inline-formula> <tex-math>$128.29\\times $ </tex-math></inline-formula> compared to NVIDIA Tesla V100 GPU.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2516-2529"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10839072/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Heterogeneous graph neural networks (HGNNs) have recently demonstrated significant advantages of capturing powerful structural and semantic information in heterogeneous graphs. Different from homogeneous graph neural networks directly aggregating information based on neighbors, HGNNs aggregate information based on complex metapaths. ReRAM-based processing-in-memory (PIM) architecture can reduce data movement and compute matrix-vector multiplication (MVM) in analog. It can be well used to accelerate HGNNs. However, the complex metapath-based aggregation of HGNNs makes it challenging to efficiently utilize the parallelism of ReRAM and vertices data reuse. To this end, we propose ChainPIM, the first ReRAM-based processing-in-memory accelerator for HGNNs featuring high-computing parallelism and vertices data reuse. Specifically, we introduce R-chain, which is based on a chain structure to build related metapath instances together. We can efficiently reuse vertices through R-chain and process different R-chains in parallel. Then, we further design an efficient storage format for storing R-chains, which reduces a lot of repeated vertices storage. Finally, a specialized ReRAM-based architecture is developed to pipeline different types of aggregations in HGNNs, fully exploiting the huge potential of multilevel parallelism in HGNNs. Our experiments show that ChainPIM achieves an average memory space reduction of 47.86% and performance improvement by

$128.29\times $

compared to NVIDIA Tesla V100 GPU.

查看原文本刊更多论文

ChainPIM：基于链结构的hgnn内存处理加速器

近年来，异构图神经网络（hgnn）在捕获异构图中强大的结构和语义信息方面表现出了显著的优势。与同构图神经网络直接基于邻居进行信息聚合不同，hgnn基于复杂元路径进行信息聚合。基于reram的内存处理（PIM）架构可以减少数据移动和模拟计算矩阵向量乘法（MVM）。它可以很好地用于加速hgnn。然而，复杂的基于元路径的hgnn聚合给有效利用ReRAM和顶点数据重用的并行性带来了挑战。为此，我们提出了ChainPIM，这是第一个基于reram的hgnn内存处理加速器，具有高计算并行性和顶点数据重用性。具体来说，我们引入了R-chain，它基于一个链式结构来共同构建相关的元路径实例。我们可以通过r链有效地重用顶点，并并行处理不同的r链。然后，我们进一步设计了一种高效的r链存储格式，减少了大量的重复顶点存储。最后，开发了一种专门的基于reram的架构来流水线hgnn中不同类型的聚合，充分挖掘了hgnn中多层次并行的巨大潜力。我们的实验表明，与NVIDIA Tesla V100 GPU相比，ChainPIM的平均内存空间减少了47.86%，性能提高了128.29倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.