混合OpenMP-MPI并行:从小型集群到大型集群的移植实验

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2018-03-21 DOI:10.1109/PDP2018.2018.00051

M. Ferretti, L. Santangelo

{"title":"混合OpenMP-MPI并行:从小型集群到大型集群的移植实验","authors":"M. Ferretti, L. Santangelo","doi":"10.1109/PDP2018.2018.00051","DOIUrl":null,"url":null,"abstract":"After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Hybrid OpenMP-MPI Parallelism: Porting Experiments from Small to Large Clusters\",\"authors\":\"M. Ferretti, L. Santangelo\",\"doi\":\"10.1109/PDP2018.2018.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.\",\"PeriodicalId\":333367,\"journal\":{\"name\":\"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP2018.2018.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文简要介绍了Cross Motif Search及其OpenMP和混合OpenMP- mpi实现，比较了混合OpenMP- mpi在小型集群和实际HPC系统上的可扩展性、效率和加速，解释了哪些因素使应用程序在实际HPC架构上运行时更高效。使用分析和跟踪工具强调，由于不同的因素(线程之间的堆争用、自旋转时间和OpenMP引入的开销时间以及线程安全的外部函数)，混合实现无法利用OpenMP的并行性，这使得纯MPI实现优于任何其他混合实现。通过描述工作负载的特征，我们还发现，通过改变处理任务的顺序，应用程序得到了改进。这一观察结果导致引入了一种新的选择策略，称为最长作业优先。新策略代表了在所有运行的MPI进程之间提交任务的成功解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hybrid OpenMP-MPI Parallelism: Porting Experiments from Small to Large Clusters

After a brief introduction on Cross Motif Search and its OpenMP and Hybrid OpenMP-MPI implementations, this paper compares the scalability, efficiency and speedup of the hybrid implementation on a small cluster and on a real HPC system, explaining which factors make the application more efficient when it runs on the real HPC architecture. Using profiling and tracing tools highlighted that the hybrid implementation cannot exploit the OpenMP parallelism because of different factors (heap contention among the threads, spin time and overhead time introduced by OpenMP and thread-safe external functions), making the pure MPI implementation better than any other hybrid one. By characterizing of the workload, we also discovered that the application gets improved by changing the order with which tasks are processed. This observation leads to the introduction of a new selection policy, named Longest Job First. The new policy represents a winning solution for tasks submission among all running MPI processes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量