基于GCN的CPU-GPU异构并行计算SpTTM优化方法

IF 1.2 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Parallel Computing Pub Date : 2023-02-17 DOI:10.1145/3584373

Hao Wang, Wangdong Yang, Renqiu Ouyang, Rong Hu, Kenli Li, Keqin Li

{"title":"基于GCN的CPU-GPU异构并行计算SpTTM优化方法","authors":"Hao Wang, Wangdong Yang, Renqiu Ouyang, Rong Hu, Kenli Li, Keqin Li","doi":"10.1145/3584373","DOIUrl":null,"url":null,"abstract":"Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"10 1","pages":"1 - 23"},"PeriodicalIF":1.2000,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN\",\"authors\":\"Hao Wang, Wangdong Yang, Renqiu Ouyang, Rong Hu, Kenli Li, Keqin Li\",\"doi\":\"10.1145/3584373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.\",\"PeriodicalId\":42115,\"journal\":{\"name\":\"ACM Transactions on Parallel Computing\",\"volume\":\"10 1\",\"pages\":\"1 - 23\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Parallel Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3584373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Parallel Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3584373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 1

摘要

稀疏张量-时间矩阵(SpTTM)是张量分析中的核心计算。不同张量的稀疏分布差异很大，这对设计高效、通用的SpTTM提出了很大的挑战。本文描述了CPU-GPU异构混合系统上的SpTTM，并给出了不同稀疏格式下SpTTM的并行执行策略。通过对理论计算能力的分析和任务数的估计，实现了异构系统中CPU和GPU的负载均衡。讨论了一种用图结构描述张量稀疏结构的方法，并设计了一种新的图神经网络SPT-GCN来选择合适的张量稀疏格式。此外，我们使用真实数据集进行了广泛的实验，以证明我们提出的输入感知切片SpTTM的优势和效率。实验结果表明，与ParTI相比，我们的输入感知切片SpTTM可以实现1.310倍的平均加速!库在CPU-GPU异构系统上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN

Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Parallel Computing COMPUTER SCIENCE, THEORY & METHODS-

CiteScore

4.10

自引率

0.00%

发文量