在Intel Xeon Phi和stampede上使用Intel框架的初步经验

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery Pub Date : 2013-07-22 DOI:10.1145/2484762.2484779

Qingyu Meng, A. Humphrey, John A. Schmidt, M. Berzins

{"title":"在Intel Xeon Phi和stampede上使用Intel框架的初步经验","authors":"Qingyu Meng, A. Humphrey, John A. Schmidt, M. Berzins","doi":"10.1145/2484762.2484779","DOIUrl":null,"url":null,"abstract":"In this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah uses a combination of fluid-flow solvers and particle-based methods, together with a novel asynchronous task-based approach and fully automated load balancing. While we have designed scalable Uintah runtime systems for large CPU core counts, the emergence of heterogeneous systems presents considerable challenges in terms of effectively utilizing additional on-node accelerators and co-processors, deep memory hierarchies, as well as managing multiple levels of parallelism. Our recent work has addressed the emergence of heterogeneous CPU/GPU systems with the design of a Unified heterogeneous runtime system, enabling Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. Using this design, Uintah has run at full scale on the Keeneland System and TitanDev. With the release of the Intel Xeon Phi co-processor and the recent availability of the Stampede system, we show that Uintah may be modified to utilize such a coprocessor based system. We also explore the different usage models provided by the Xeon Phi with the aim of understanding portability of a general purpose framework like Uintah to this architecture. These usage models range from the pragma based offload model to the more complex symmetric model, utilizing all co-processor and host CPU cores simultaneously. We provide preliminary results of the various usage models for a challenging adaptive mesh refinement problem, as well as a detailed account of our experience adapting Uintah to run on the Stampede system. Our conclusion is that while the Stampede system is easy to use, obtaining high performance from the Xeon Phi co-processors requires a substantial but different investment to that needed for GPU-based systems.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede\",\"authors\":\"Qingyu Meng, A. Humphrey, John A. Schmidt, M. Berzins\",\"doi\":\"10.1145/2484762.2484779\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah uses a combination of fluid-flow solvers and particle-based methods, together with a novel asynchronous task-based approach and fully automated load balancing. While we have designed scalable Uintah runtime systems for large CPU core counts, the emergence of heterogeneous systems presents considerable challenges in terms of effectively utilizing additional on-node accelerators and co-processors, deep memory hierarchies, as well as managing multiple levels of parallelism. Our recent work has addressed the emergence of heterogeneous CPU/GPU systems with the design of a Unified heterogeneous runtime system, enabling Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. Using this design, Uintah has run at full scale on the Keeneland System and TitanDev. With the release of the Intel Xeon Phi co-processor and the recent availability of the Stampede system, we show that Uintah may be modified to utilize such a coprocessor based system. We also explore the different usage models provided by the Xeon Phi with the aim of understanding portability of a general purpose framework like Uintah to this architecture. These usage models range from the pragma based offload model to the more complex symmetric model, utilizing all co-processor and host CPU cores simultaneously. We provide preliminary results of the various usage models for a challenging adaptive mesh refinement problem, as well as a detailed account of our experience adapting Uintah to run on the Stampede system. Our conclusion is that while the Stampede system is easy to use, obtaining high performance from the Xeon Phi co-processors requires a substantial but different investment to that needed for GPU-based systems.\",\"PeriodicalId\":426819,\"journal\":{\"name\":\"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2484762.2484779\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484762.2484779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

在这项工作中，我们描述了在untah计算框架背景下Stampede系统的初步经验。开发untah是为了提供一个环境，用于解决结构化自适应网格上广泛的流体-结构相互作用问题。inttah结合了流体流动求解器和基于粒子的方法，以及一种新颖的基于异步任务的方法和全自动负载平衡。虽然我们已经为大型CPU核数设计了可扩展的unix运行时系统，但异构系统的出现在有效利用额外的节点上加速器和协处理器、深度内存层次结构以及管理多层并行性方面提出了相当大的挑战。我们最近的工作是通过设计统一的异构运行时系统来解决异构CPU/GPU系统的出现，使intel能够充分利用这些架构，支持CPU和GPU计算任务的异步、乱序调度。使用这种设计，untah已经在Keeneland系统和TitanDev上全速运行。随着英特尔Xeon Phi协处理器的发布和Stampede系统的最近可用性，我们表明犹他可能被修改以利用这种基于协处理器的系统。我们还探讨了Xeon Phi提供的不同使用模型，目的是了解像intel这样的通用框架到该体系结构的可移植性。这些使用模型的范围从基于pragma的卸载模型到更复杂的对称模型，同时利用所有协处理器和主机CPU内核。我们提供了各种使用模型的初步结果，以解决具有挑战性的自适应网格细化问题，并详细介绍了我们在Stampede系统上调整intah的经验。我们的结论是，虽然Stampede系统易于使用，但从Xeon Phi协处理器获得高性能需要大量但不同于基于gpu的系统所需的投资。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede

In this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah uses a combination of fluid-flow solvers and particle-based methods, together with a novel asynchronous task-based approach and fully automated load balancing. While we have designed scalable Uintah runtime systems for large CPU core counts, the emergence of heterogeneous systems presents considerable challenges in terms of effectively utilizing additional on-node accelerators and co-processors, deep memory hierarchies, as well as managing multiple levels of parallelism. Our recent work has addressed the emergence of heterogeneous CPU/GPU systems with the design of a Unified heterogeneous runtime system, enabling Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. Using this design, Uintah has run at full scale on the Keeneland System and TitanDev. With the release of the Intel Xeon Phi co-processor and the recent availability of the Stampede system, we show that Uintah may be modified to utilize such a coprocessor based system. We also explore the different usage models provided by the Xeon Phi with the aim of understanding portability of a general purpose framework like Uintah to this architecture. These usage models range from the pragma based offload model to the more complex symmetric model, utilizing all co-processor and host CPU cores simultaneously. We provide preliminary results of the various usage models for a challenging adaptive mesh refinement problem, as well as a detailed account of our experience adapting Uintah to run on the Stampede system. Our conclusion is that while the Stampede system is easy to use, obtaining high performance from the Xeon Phi co-processors requires a substantial but different investment to that needed for GPU-based systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery

自引率

0.00%

发文量