多尺度流体模拟的通信-重叠混合分解并行算法

Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan
{"title":"多尺度流体模拟的通信-重叠混合分解并行算法","authors":"Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan","doi":"10.1145/3337821.3337882","DOIUrl":null,"url":null,"abstract":"The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations\",\"authors\":\"Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan\",\"doi\":\"10.1145/3337821.3337882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.\",\"PeriodicalId\":405273,\"journal\":{\"name\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3337821.3337882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

MCDPar (Parallel algorithm for multi-scale simulation based Mesh and BCF Decomposition)算法显著缩短了多尺度流体模拟的执行时间,提高了多尺度流体模拟的并行扩展性。然而,对于超大规模的并行仿真,性能瓶颈仍然存在。本文设计了一种通信重叠混合分解并行算法,以提高原有MCDPar在大规模集群上的性能。通过非阻塞通信和代码调度,使主从组之间的通信开销与主进程更微观的配置域的计算重叠。从而提高了多尺度求解器在大规模并行仿真中的并行效率和可扩展性。在配置字段数NBCF = 1000,网格单元数Ncell = 64000的测试用例中,对应主从进程之间的通信百分比降低了39.71%。在NBCF = 3000和Ncell = 64000的测试用例中,使用通信重叠算法,最快执行的时间成本降低了31.13%,与原来的128核相比,256核提供了更好的并行扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations
The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信