Xeon Phi集群节点内和节点间卸载的统一编程模型

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI:10.1109/SC.2014.22

M. Noack, Florian Wende, T. Steinke, F. Cordes

{"title":"Xeon Phi集群节点内和节点间卸载的统一编程模型","authors":"M. Noack, Florian Wende, T. Steinke, F. Cordes","doi":"10.1109/SC.2014.22","DOIUrl":null,"url":null,"abstract":"Standard offload programming models for the Xeon Phi, e.g. Intel LEO and OpenMP 4.0, are restricted to a single compute node and hence a limited number of coprocessors. Scaling applications across a Xeon Phi cluster/supercomputer thus requires hybrid programming approaches, usually MPI+X. In this work, we present a framework based on heterogeneous active messages (HAM-Offload) that provides the means to offload work to local and remote (co)processors using a unified offload API. Since HAM-Offload provides similar primitives as current local offload frameworks, existing applications can be easily ported to overcome the single-node limitation while keeping the convenient offload programming model. We demonstrate the effectiveness of the framework by using it to enable a real-world application from the field of molecular dynamics to use multiple local and remote Xeon Phis. The evaluation shows good scaling behavior. Compared with LEO, performance is equal for large offloads and significantly better for small offloads.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters\",\"authors\":\"M. Noack, Florian Wende, T. Steinke, F. Cordes\",\"doi\":\"10.1109/SC.2014.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Standard offload programming models for the Xeon Phi, e.g. Intel LEO and OpenMP 4.0, are restricted to a single compute node and hence a limited number of coprocessors. Scaling applications across a Xeon Phi cluster/supercomputer thus requires hybrid programming approaches, usually MPI+X. In this work, we present a framework based on heterogeneous active messages (HAM-Offload) that provides the means to offload work to local and remote (co)processors using a unified offload API. Since HAM-Offload provides similar primitives as current local offload frameworks, existing applications can be easily ported to overcome the single-node limitation while keeping the convenient offload programming model. We demonstrate the effectiveness of the framework by using it to enable a real-world application from the field of molecular dynamics to use multiple local and remote Xeon Phis. The evaluation shows good scaling behavior. Compared with LEO, performance is equal for large offloads and significantly better for small offloads.\",\"PeriodicalId\":275261,\"journal\":{\"name\":\"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.2014.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2014.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

Xeon Phi协处理器的标准卸载编程模型，例如Intel LEO和OpenMP 4.0，仅限于单个计算节点，因此协处理器数量有限。因此，跨Xeon Phi集群/超级计算机扩展应用程序需要混合编程方法，通常是MPI+X。在这项工作中，我们提出了一个基于异构活动消息(HAM-Offload)的框架，该框架提供了使用统一的卸载API将工作卸载到本地和远程(co)处理器的方法。由于HAM-Offload提供了与当前本地卸载框架类似的原语，因此可以很容易地移植现有应用程序，以克服单节点限制，同时保持方便的卸载编程模型。我们通过使用它来证明该框架的有效性，使来自分子动力学领域的实际应用能够使用多个本地和远程Xeon Phis。评价结果表明，该材料具有良好的标度性能。与LEO相比，大卸载的性能是一样的，小卸载的性能明显更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters

Standard offload programming models for the Xeon Phi, e.g. Intel LEO and OpenMP 4.0, are restricted to a single compute node and hence a limited number of coprocessors. Scaling applications across a Xeon Phi cluster/supercomputer thus requires hybrid programming approaches, usually MPI+X. In this work, we present a framework based on heterogeneous active messages (HAM-Offload) that provides the means to offload work to local and remote (co)processors using a unified offload API. Since HAM-Offload provides similar primitives as current local offload frameworks, existing applications can be easily ported to overcome the single-node limitation while keeping the convenient offload programming model. We demonstrate the effectiveness of the framework by using it to enable a real-world application from the field of molecular dynamics to use multiple local and remote Xeon Phis. The evaluation shows good scaling behavior. Compared with LEO, performance is equal for large offloads and significantly better for small offloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量