{"title":"混合OpenMP-MPI并行:从小型集群到大型集群的移植实验","authors":"M. Ferretti, L. Santangelo","doi":"10.1109/PDP2018.2018.00051","DOIUrl":null,"url":null,"abstract":"After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Hybrid OpenMP-MPI Parallelism: Porting Experiments from Small to Large Clusters\",\"authors\":\"M. Ferretti, L. Santangelo\",\"doi\":\"10.1109/PDP2018.2018.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.\",\"PeriodicalId\":333367,\"journal\":{\"name\":\"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP2018.2018.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid OpenMP-MPI Parallelism: Porting Experiments from Small to Large Clusters
After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.