Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree

Linjian Ma, Edgar Solomonik
{"title":"Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree","authors":"Linjian Ma, Edgar Solomonik","doi":"10.1109/IPDPS49936.2021.00049","DOIUrl":null,"url":null,"abstract":"The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every $(N-1)/N$ sweeps. This algorithm reduces the leading order computational cost by a factor of $2(N-1)/N$ relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speedup from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every $(N-1)/N$ sweeps. This algorithm reduces the leading order computational cost by a factor of $2(N-1)/N$ relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speedup from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.
基于两两摄动和多扫描维树的高效并行CP分解
常用的正则多进(CP)张量分解交替最小二乘(ALS)算法的开销主要是矩阵张量乘以Khatri-Rao积(MTTKRP)核。这个核是建立二次优化子问题所必需的。最先进的并行ALS实现使用维度树来避免每次ALS扫描中跨mttkrp的冗余计算。在本文中,我们提出了两种新的并行算法来加速CP-ALS。我们引入了多扫描维树(MSDT)算法,该算法要求每$(N-1)/N$扫描一次N阶输入张量与第一次收缩的输入矩阵之间的收缩。该算法相对于先前已知的最佳方法减少了2美元(N-1)/N美元的领先顺序计算成本。此外,我们还引入了一种通信效率更高的方法来并行化近似的CP-ALS算法,即成对摄动。该技术对子问题使用微扰修正而不是重新计算收缩,并且渐近地加速了渐近逼近。我们在Stampede2超级计算机上的1024个处理器上的基准测试结果表明,与最先进的基于维树的CP- als实现相比,CP分解从MSDT中获得1.25倍的加速,从两两扰动中获得1.94倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信