T. Tang, Hao Wu, Wei Bao, Pengyi Yang, Dong Yuan, B. Zhou
{"title":"大型HPC集群全对并行计算新算法","authors":"T. Tang, Hao Wu, Wei Bao, Pengyi Yang, Dong Yuan, B. Zhou","doi":"10.1109/PDCAT46702.2019.00045","DOIUrl":null,"url":null,"abstract":"All pairwise computation is defined as performing computation between every pair of the elements in a given dataset. It is often a necessary first step in a number of bioinformatics applications. Many of such applications require multiple terabytes of main memory and take multiple peta floating point operations to complete the computation. Therefore, large HPC clusters are needed to tackle these large-scale computational problems. Conventionally designed parallel algorithms using data partitioning may have a scalability issue, i.e., for a given problem of fixed size the efficiency may decrease if the number of compute nodes is increased (Amdahl's law). In this paper we introduce a new method for parallel algorithm design. Using this method we first design an efficient one-dimensional (1D) ring algorithm and then a two-dimensional (2D) algorithm based on the 1D ring for all pairwise computation. When increasing the compute nodes, instead of reducing the block size, we make multiple copies of the original data blocks in the 1D ring and distribute them across the added compute nodes in the other dimension. By properly organizing the compute nodes the communication overhead can be reduced to a minimum in this two-dimensional setting. Experiments on a Cray XC40 HPC supercomputer show that our new algorithms are very efficient and scalable for large-scale all pairwise computation on large HPC clusters.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"New Parallel Algorithms for All Pairwise Computation on Large HPC Clusters\",\"authors\":\"T. Tang, Hao Wu, Wei Bao, Pengyi Yang, Dong Yuan, B. Zhou\",\"doi\":\"10.1109/PDCAT46702.2019.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"All pairwise computation is defined as performing computation between every pair of the elements in a given dataset. It is often a necessary first step in a number of bioinformatics applications. Many of such applications require multiple terabytes of main memory and take multiple peta floating point operations to complete the computation. Therefore, large HPC clusters are needed to tackle these large-scale computational problems. Conventionally designed parallel algorithms using data partitioning may have a scalability issue, i.e., for a given problem of fixed size the efficiency may decrease if the number of compute nodes is increased (Amdahl's law). In this paper we introduce a new method for parallel algorithm design. Using this method we first design an efficient one-dimensional (1D) ring algorithm and then a two-dimensional (2D) algorithm based on the 1D ring for all pairwise computation. When increasing the compute nodes, instead of reducing the block size, we make multiple copies of the original data blocks in the 1D ring and distribute them across the added compute nodes in the other dimension. By properly organizing the compute nodes the communication overhead can be reduced to a minimum in this two-dimensional setting. Experiments on a Cray XC40 HPC supercomputer show that our new algorithms are very efficient and scalable for large-scale all pairwise computation on large HPC clusters.\",\"PeriodicalId\":166126,\"journal\":{\"name\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT46702.2019.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New Parallel Algorithms for All Pairwise Computation on Large HPC Clusters
All pairwise computation is defined as performing computation between every pair of the elements in a given dataset. It is often a necessary first step in a number of bioinformatics applications. Many of such applications require multiple terabytes of main memory and take multiple peta floating point operations to complete the computation. Therefore, large HPC clusters are needed to tackle these large-scale computational problems. Conventionally designed parallel algorithms using data partitioning may have a scalability issue, i.e., for a given problem of fixed size the efficiency may decrease if the number of compute nodes is increased (Amdahl's law). In this paper we introduce a new method for parallel algorithm design. Using this method we first design an efficient one-dimensional (1D) ring algorithm and then a two-dimensional (2D) algorithm based on the 1D ring for all pairwise computation. When increasing the compute nodes, instead of reducing the block size, we make multiple copies of the original data blocks in the 1D ring and distribute them across the added compute nodes in the other dimension. By properly organizing the compute nodes the communication overhead can be reduced to a minimum in this two-dimensional setting. Experiments on a Cray XC40 HPC supercomputer show that our new algorithms are very efficient and scalable for large-scale all pairwise computation on large HPC clusters.