{"title":"Performance analysis of parallel smoothed particle hydrodynamics on multi-core CPUs","authors":"Cheng Wenbo, Yucheng Yao, Yang Zhang","doi":"10.1109/CCIOT.2014.7062511","DOIUrl":null,"url":null,"abstract":"This paper presents a parallel SPH implementation on multi-core CPUs. The implementation uses a hash table to store particles data and divides the program code into 2 parts for parallelization. The first part has no data race, but the second part has data race. Then, the paper compares the running time and parallel speedup of each part to find the bottleneck of the parallel SPH program. The results show that the program can achieve linear speedup just with the first part to be parallelized when the search radius is large. And the second part has become a performance bottleneck only when the search radius is small enough (for each cell only contains one or two particles on average). We present a method to parallelize the second part without affecting the performance of the first part. The results show that our method can ease the performance bottleneck when the search radius is small.","PeriodicalId":255477,"journal":{"name":"Proceedings of 2014 International Conference on Cloud Computing and Internet of Things","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2014 International Conference on Cloud Computing and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIOT.2014.7062511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents a parallel SPH implementation on multi-core CPUs. The implementation uses a hash table to store particles data and divides the program code into 2 parts for parallelization. The first part has no data race, but the second part has data race. Then, the paper compares the running time and parallel speedup of each part to find the bottleneck of the parallel SPH program. The results show that the program can achieve linear speedup just with the first part to be parallelized when the search radius is large. And the second part has become a performance bottleneck only when the search radius is small enough (for each cell only contains one or two particles on average). We present a method to parallelize the second part without affecting the performance of the first part. The results show that our method can ease the performance bottleneck when the search radius is small.