Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-10-22 DOI:10.1109/IPDPS49936.2021.00049

Linjian Ma, Edgar Solomonik

{"title":"Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree","authors":"Linjian Ma, Edgar Solomonik","doi":"10.1109/IPDPS49936.2021.00049","DOIUrl":null,"url":null,"abstract":"The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every $(N-1)/N$ sweeps. This algorithm reduces the leading order computational cost by a factor of $2(N-1)/N$ relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speedup from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every $(N-1)/N$ sweeps. This algorithm reduces the leading order computational cost by a factor of $2(N-1)/N$ relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speedup from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.

查看原文本刊更多论文

基于两两摄动和多扫描维树的高效并行CP分解

常用的正则多进(CP)张量分解交替最小二乘(ALS)算法的开销主要是矩阵张量乘以Khatri-Rao积(MTTKRP)核。这个核是建立二次优化子问题所必需的。最先进的并行ALS实现使用维度树来避免每次ALS扫描中跨mttkrp的冗余计算。在本文中，我们提出了两种新的并行算法来加速CP-ALS。我们引入了多扫描维树(MSDT)算法，该算法要求每$(N-1)/N$扫描一次N阶输入张量与第一次收缩的输入矩阵之间的收缩。该算法相对于先前已知的最佳方法减少了2美元(N-1)/N美元的领先顺序计算成本。此外，我们还引入了一种通信效率更高的方法来并行化近似的CP-ALS算法，即成对摄动。该技术对子问题使用微扰修正而不是重新计算收缩，并且渐近地加速了渐近逼近。我们在Stampede2超级计算机上的1024个处理器上的基准测试结果表明，与最先进的基于维树的CP- als实现相比，CP分解从MSDT中获得1.25倍的加速，从两两扰动中获得1.94倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量