Md Taufique Hussain, Oguz Selvitopi, A. Buluç, A. Azad
{"title":"Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale","authors":"Md Taufique Hussain, Oguz Selvitopi, A. Buluç, A. Azad","doi":"10.1109/IPDPS49936.2021.00018","DOIUrl":null,"url":null,"abstract":"We present a distributed-memory algorithm for sparse matrix-matrix multiplication (SpGEMM) of extremely large matrices where the generated output is larger than the aggregated memory of a target supercomputer. We address this challenge by splitting the computation into batches with each batch generating a set of output columns. We developed a distributed symbolic step to understand the memory requirement and determine the number of batches beforehand. We integrated the multiplication in each batch with an existing communication avoiding techniques to reduce the communication overhead while multiplying matrices in a 3-D process grid. Furthermore, we made the in-node computations faster by designing a sort-free SpGEMM and merging algorithm. Incorporating all the proposed approaches, our SpGEMM scales for large protein-similarity networks using 262,144 cores on a Cray XC40 supercomputer while achieving a 10x speedup using 16x more nodes. Our code is available as part of the Combinatorial BLAS library (https://github.com/PASSIONLab/CombBLAS).","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"9 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
We present a distributed-memory algorithm for sparse matrix-matrix multiplication (SpGEMM) of extremely large matrices where the generated output is larger than the aggregated memory of a target supercomputer. We address this challenge by splitting the computation into batches with each batch generating a set of output columns. We developed a distributed symbolic step to understand the memory requirement and determine the number of batches beforehand. We integrated the multiplication in each batch with an existing communication avoiding techniques to reduce the communication overhead while multiplying matrices in a 3-D process grid. Furthermore, we made the in-node computations faster by designing a sort-free SpGEMM and merging algorithm. Incorporating all the proposed approaches, our SpGEMM scales for large protein-similarity networks using 262,144 cores on a Cray XC40 supercomputer while achieving a 10x speedup using 16x more nodes. Our code is available as part of the Combinatorial BLAS library (https://github.com/PASSIONLab/CombBLAS).