Argo NodeOS: Toward Unified Resource Management for Exascale

Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, B. V. Essen, R. Gioiosa, K. Iskra, M. Gokhale, Kazutomo Yoshii, P. Beckman
{"title":"Argo NodeOS: Toward Unified Resource Management for Exascale","authors":"Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, B. V. Essen, R. Gioiosa, K. Iskra, M. Gokhale, Kazutomo Yoshii, P. Beckman","doi":"10.1109/IPDPS.2017.25","DOIUrl":null,"url":null,"abstract":"Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extendthe memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access tonode-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identifycontentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts. We extendthe memory management of Linux to be able to subdivide NUMA memory nodes, allowing better resource partitioning among processes running on the same node. We also add support for memory-mapped access tonode-local, PCIe-attached NVRAM devices and introduce a new scheduling class targeted at parallel runtimes supporting user-level load balancing. These features are unified into compute containers, a containerization approach focused on providing modern HPC applications with dynamic control over a wide range of kernel interfaces. To keep our approach compatible with industrial containerization products, we also identifycontentions points for the adoption of containers in HPC settings. Each NodeOS feature is evaluated by using a set of parallel benchmarks, miniapps, and coupled applications consisting of simulation and data analysis components, running on a modern NUMA platform. We observe out-of-the-box performance improvements easily matching, and often exceeding, those observed with expert-optimized configurations on standard OS kernels. Our lightweight approach to resource management retains the many benefits of a full OS kernel that application programmers have learned to depend on, at the same time providing a set of extensions that can be freely mixed and matched to best benefit particular application components.
Argo NodeOS:面向百亿亿级的统一资源管理
Exascale系统预计将具有数十万个计算节点,数百个硬件线程和复杂的内存层次结构,并混合了封装和持久内存模块。在这种情况下,Argo项目正在为百亿亿次机器开发一种新的操作系统。针对使用工作流或耦合代码的生产工作负载,我们在几个方面改进了Linux内核。我们扩展了Linux的内存管理,以便能够细分NUMA内存节点,从而允许在同一节点上运行的进程之间进行更好的资源分区。我们还增加了对内存映射访问节点本地、pcie连接NVRAM设备的支持,并引入了一个新的调度类,针对支持用户级负载平衡的并行运行时。这些特性被统一到计算容器中,这是一种容器化方法,专注于为现代HPC应用程序提供对各种内核接口的动态控制。为了使我们的方法与工业容器化产品兼容,我们还确定了在高性能计算环境中采用容器的争议点。每个NodeOS特性都是通过使用一组并行基准测试、迷你应用程序和耦合应用程序来评估的,这些应用程序由仿真和数据分析组件组成,运行在现代NUMA平台上。我们观察到开箱即用的性能改进很容易与标准操作系统内核上的专家优化配置相匹配,甚至经常超过。我们的轻量级资源管理方法保留了应用程序程序员已经学会依赖的完整操作系统内核的许多优点,同时提供了一组可以自由混合和匹配的扩展,以最大限度地使特定的应用程序组件受益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信