Comparison of Parallel Programming Models on Intel MIC Computer Cluster

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.105

Chenggang Lai, Zhijun Hao, Miaoqing Huang, Xuan Shi, Haihang You

{"title":"Comparison of Parallel Programming Models on Intel MIC Computer Cluster","authors":"Chenggang Lai, Zhijun Hao, Miaoqing Huang, Xuan Shi, Haihang You","doi":"10.1109/IPDPSW.2014.105","DOIUrl":null,"url":null,"abstract":"Coprocessors based on Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Followings are our findings. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP, on Beacon computer cluster. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Coprocessors based on Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Followings are our findings. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP, on Beacon computer cluster. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

查看原文本刊更多论文

Intel MIC计算机集群上并行编程模型的比较

基于Intel多集成核心(MIC)架构的协处理器已被广泛应用于高性能计算机集群中。MIC处理器支持MPI和OpenMP等典型的并行编程模型来实现并行性。在这项工作中，我们使用Beacon计算机集群对不同编程模型下MIC处理器的性能和可扩展性进行了详细的研究。以下是我们的发现。(1)在信标计算机集群上，MIC处理器上的本机MPI编程模型通常优于使用OpenMP将工作负载卸载到MIC内核的卸载编程模型。(2)在原生MPI编程模型的基础上，每个MPI进程内部的多线程可以进一步提高具有MIC协处理器的计算机集群上并行应用程序的性能。(3)给定固定数量的MPI进程，将这些MPI进程调度到尽可能少的MIC处理器是一个很好的策略，以减少跨处理器通信开销。(4)混合MPI编程模型将数据处理分散到MIC核和CPU核，其性能优于原生MPI编程模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量