基于GCN的CPU-GPU异构并行计算SpTTM优化方法

Pub Date : 2023-02-17 DOI:10.1145/3584373
Hao Wang, Wangdong Yang, Renqiu Ouyang, Rong Hu, Kenli Li, Keqin Li
{"title":"基于GCN的CPU-GPU异构并行计算SpTTM优化方法","authors":"Hao Wang, Wangdong Yang, Renqiu Ouyang, Rong Hu, Kenli Li, Keqin Li","doi":"10.1145/3584373","DOIUrl":null,"url":null,"abstract":"Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN\",\"authors\":\"Hao Wang, Wangdong Yang, Renqiu Ouyang, Rong Hu, Kenli Li, Keqin Li\",\"doi\":\"10.1145/3584373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3584373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3584373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

稀疏张量-时间矩阵(SpTTM)是张量分析中的核心计算。不同张量的稀疏分布差异很大,这对设计高效、通用的SpTTM提出了很大的挑战。本文描述了CPU-GPU异构混合系统上的SpTTM,并给出了不同稀疏格式下SpTTM的并行执行策略。通过对理论计算能力的分析和任务数的估计,实现了异构系统中CPU和GPU的负载均衡。讨论了一种用图结构描述张量稀疏结构的方法,并设计了一种新的图神经网络SPT-GCN来选择合适的张量稀疏格式。此外,我们使用真实数据集进行了广泛的实验,以证明我们提出的输入感知切片SpTTM的优势和效率。实验结果表明,与ParTI相比,我们的输入感知切片SpTTM可以实现1.310倍的平均加速!库在CPU-GPU异构系统上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享
查看原文
A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN
Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse distributions of different tensors vary greatly, which poses a big challenge to designing efficient and general SpTTM. In this paper, we describe SpTTM on CPU-GPU heterogeneous hybrid systems and give a parallel execution strategy for SpTTM in different sparse formats. We analyze the theoretical computer powers and estimate the number of tasks to achieve the load balancing between the CPU and the GPU of the heterogeneous systems. We discuss a method to describe tensor sparse structure by graph structure and design a new graph neural network SPT-GCN to select a suitable tensor sparse format. Furthermore, we perform extensive experiments using real datasets to demonstrate the advantages and efficiency of our proposed input-aware slice-wise SpTTM. The experimental results show that our input-aware slice-wise SpTTM can achieve an average speedup of 1.310 × compared to ParTI! library on a CPU-GPU heterogeneous system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信