多尺度流体模拟的通信-重叠混合分解并行算法

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337882

Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan

{"title":"多尺度流体模拟的通信-重叠混合分解并行算法","authors":"Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan","doi":"10.1145/3337821.3337882","DOIUrl":null,"url":null,"abstract":"The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations\",\"authors\":\"Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan\",\"doi\":\"10.1145/3337821.3337882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.\",\"PeriodicalId\":405273,\"journal\":{\"name\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3337821.3337882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

MCDPar (Parallel algorithm for multi-scale simulation based Mesh and BCF Decomposition)算法显著缩短了多尺度流体模拟的执行时间，提高了多尺度流体模拟的并行扩展性。然而，对于超大规模的并行仿真，性能瓶颈仍然存在。本文设计了一种通信重叠混合分解并行算法，以提高原有MCDPar在大规模集群上的性能。通过非阻塞通信和代码调度，使主从组之间的通信开销与主进程更微观的配置域的计算重叠。从而提高了多尺度求解器在大规模并行仿真中的并行效率和可扩展性。在配置字段数NBCF = 1000，网格单元数Ncell = 64000的测试用例中，对应主从进程之间的通信百分比降低了39.71%。在NBCF = 3000和Ncell = 64000的测试用例中，使用通信重叠算法，最快执行的时间成本降低了31.13%，与原来的128核相比，256核提供了更好的并行扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations

The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量