{"title":"Distributed-Memory Algorithms for Maximal Cardinality Matching Using Matrix Algebra","authors":"A. Azad, A. Buluç","doi":"10.1109/CLUSTER.2015.62","DOIUrl":null,"url":null,"abstract":"We design and implement distributed-memory parallel algorithms for computing maximal cardinality matching in a bipartite graph. Relying on matrix algebra building blocks, our algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. In contrast to existing parallel algorithms, empirical approximation ratios of the new algorithms are insensitive to concurrency and stay relatively constant with increasing processor counts. On real instances, our algorithms achieve up to 300x speedup on 1024 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 processors.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"65 38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
We design and implement distributed-memory parallel algorithms for computing maximal cardinality matching in a bipartite graph. Relying on matrix algebra building blocks, our algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. In contrast to existing parallel algorithms, empirical approximation ratios of the new algorithms are insensitive to concurrency and stay relatively constant with increasing processor counts. On real instances, our algorithms achieve up to 300x speedup on 1024 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 processors.