Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017最新文献

Jitter-Trace: a low-overhead OS noise tracing tool based on Linux Perf Jitter-Trace:一个基于Linux Perf的低开销操作系统噪声跟踪工具

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095772

N. Gonzalez, Alessandro Morari, Fabio Checconi

引用次数: 1

UNITY: Unified Memory and File Space UNITY:统一内存和文件空间

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095776

T. Jones, Michael J. Brim, Geoffroy R. Vallée, B. Mayer, A. Welch, Tonglin Li, M. Lang, Latchesar Ionkov, Douglas Otstott, Ada Gavrilovska, G. Eisenhauer, Thaleia Dimitra Doudali, Pradeep R. Fernando

引用次数: 8

Scheduling Chapel Tasks with Qthreads on Manycore: A Tale of Two Schedulers 在Manycore上使用Qthreads调度Chapel任务:两个调度程序的故事

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095774

N. Evans, Stephen L. Olivier, R. Barrett, George Stelle

引用次数: 3

Toward Full Specialization of the HPC Software Stack: Reconciling Application Containers and Lightweight Multi-kernels 迈向高性能计算软件栈的完全专门化:协调应用程序容器和轻量级多内核

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095777

Balazs Gerofi, R. Riesen, R. Wisniewski, Y. Ishikawa

{"title":"Toward Full Specialization of the HPC Software Stack: Reconciling Application Containers and Lightweight Multi-kernels","authors":"Balazs Gerofi, R. Riesen, R. Wisniewski, Y. Ishikawa","doi":"10.1145/3095770.3095777","DOIUrl":"https://doi.org/10.1145/3095770.3095777","url":null,"abstract":"Application containers enable users to have greater control of their user-space execution environment by bundling application code with all the necessary libraries in a single software package. Lightweight multi-kernels leverage multi-core CPUs to run separate operating system (OS) kernels on different CPU cores, usually a lightweight kernel (LWK) and Linux. A multi-kernel's primary goal is attaining LWK scalability and performance in combination with support for the Linux APIs and environment. Both of these technologies are designed to address the increasing hardware complexity and the growing software diversity of High Performance Computing (HPC) systems. While containers enable specialization of user-space components, the LWK part of a multi-kernel system is also a form of software specialization, but targeting kernel space. This paper proposes a framework for combining application containers with multi-kernel operating systems thereby enabling specialization across the software stack. We provide an overview of the Linux container technologies and the challenges we faced to bring these two technologies together. Results from previous work show that multi-kernels can achieve better isolation than Linux. In this work, we deployed our framework on 1,024 Intel Xeon Phi Knights Landing nodes. We highlight two important results obtained from running at a larger scale. First, we show that containers impose zero runtime overhead even at scale. Second, by taking advantage of our integrated framework, we demonstrate that users can transparently benefit from lightweight multi-kernels, attaining identical speedups to the native multi-kernel execution.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116721842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Seastar: A Comprehensive Framework for Telemetry Data in HPC Environments Seastar: HPC环境中遥测数据的综合框架

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095775

Ole Weidner, A. Barker, M. Atkinson

{"title":"Seastar: A Comprehensive Framework for Telemetry Data in HPC Environments","authors":"Ole Weidner, A. Barker, M. Atkinson","doi":"10.1145/3095770.3095775","DOIUrl":"https://doi.org/10.1145/3095770.3095775","url":null,"abstract":"A large number of 2nd generation high-performance computing applications and services rely on adaptive and dynamic architectures and execution strategies to run efficiently, resiliently, and at scale on today's HPC infrastructures. They require information about applications and their environment to steer and optimize execution. We define this information as telemetry data. Current HPC platforms do not provide the infrastructure, interfaces and conceptual models to collect, store, analyze, and access such data. Today, applications depend on application and platform specific techniques for collecting telemetry data; introducing significant development overheads that inhibit portability and mobility. The development and adoption of adaptive, context-aware strategies is thereby impaired. To facilitate 2nd generation applications, more efficient application development, and swift adoption of adaptive applications in production, a comprehensive framework for telemetry data management must be provided by future HPC systems and services. We introduce Seastar, a conceptual model and a software framework to collect, store, analyze, and exploit streams of telemetry data generated by HPC systems and their applications. We show how Seastar can be integrated with HPC platform architectures and how it enables common application execution strategies.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126781709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Operating and Runtime Systems Challenges for HPC Systems HPC系统的操作和运行时系统挑战

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095771

A. Maccabe

{"title":"Operating and Runtime Systems Challenges for HPC Systems","authors":"A. Maccabe","doi":"10.1145/3095770.3095771","DOIUrl":"https://doi.org/10.1145/3095770.3095771","url":null,"abstract":"Future HPC systems will be characterized by extreme heterogeneity. We will see increasing heterogeneity in virtually every aspect of node architecture from computational engines to memory systems. We will see increasing heterogeneity in applications, including heterogeneity within applications (as previously independent applications are composed to build new applications). We will see increasing heterogeneity in system usage models; in some cases, the HPC system is not the most precious resource being managed. We will also see increasing heterogeneity in the shared services (e.g., storage and visualization systems) that are connected to HPC systems. All of this increasing heterogeneity is certain to create new challenges in the design and implementation of operating and runtime systems. There will be new kinds of resources to manage and many resource management tactics will be invented (and some re-discovered and adapted) to address the new heterogeneity. In essence, we will tacitly agree that the operating and runtime systems need to adapt to enable the inevitable integration of new technologies, applications, usage models, and shared services. While this agreement is critical for our ability to make incremental progress, we, as a community, must step back and ask the relevant question: Does the OS or runtime system bear the brunt of the adaptation, or will we be able to insist on changes in the technologies, applications, and environment? In the past decade, we have seen a similar tradeoff play out between the application teams and the architects of computational engines: how much floating point precision is required and how is this precision implemented? How can we define similar tradeoffs that are important in the design and implementation of operating and runtime systems?","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121808004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes 非对称性能对基于异步任务运行时的影响

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095778

D. Ganguly, J. Lange

{"title":"The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes","authors":"D. Ganguly, J. Lange","doi":"10.1145/3095770.3095778","DOIUrl":"https://doi.org/10.1145/3095770.3095778","url":null,"abstract":"It is generally accepted that future supercomputing workloads will consist of application compositions made up of coupled simulations as well as in-situ analytics. While these components have commonly been deployed using a space-shared configuration to minimize cross-workload interference, it is likely that not all the workload components will require the full processing capacity of the CPU cores they are running on. For instance, an analytics workload often does not need to run continuously and is not generally considered to have the same priority as simulation codes. In a space-shared configuration, this arrangement would lead to wasted resources due to periodically idle CPUs, which are generally unusable by traditional bulk synchronous parallel (BSP) applications. As a result, many have started to reconsider task based runtimes owing to their ability to dynamically utilize available CPU resources. While the dynamic behavior of task-based runtimes had historically been targeted at application induced load imbalances, the same basic situation arises due to the asymmetric performance resulting from time sharing a CPU with other workloads. Many have assumed that task based runtimes would be able to adapt easily to these new environments without significant modifications. In this paper, we present a preliminary set of experiments that measured how well asynchronous task-based runtimes are able to respond to load imbalances caused by the asymmetric performance of time shared CPUs. Our work focuses on a set of experiments using benchmarks running on both Charm++ and HPX-5 in the presence of a competing workload. The results show that while these runtimes are better suited at handling the scenarios than traditional runtimes, they are not yet capable of effectively addressing anything other than a fairly minimal level of CPU contention.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117287802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis 用于在线系统噪声分析的Intel PEBS开销的定量评估

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 2017-06-27 DOI: 10.1145/3095770.3095773

Soramichi Akiyama, Takahiro Hirofuchi

{"title":"Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis","authors":"Soramichi Akiyama, Takahiro Hirofuchi","doi":"10.1145/3095770.3095773","DOIUrl":"https://doi.org/10.1145/3095770.3095773","url":null,"abstract":"Analyzing system-noise incurred to high-throughput systems (e.g., Spark, RDBMS) from the underlying machines must be in the granularity of the message- or request-level to find the root causes of performance anomalies, because messages are passed through many components in very short periods. To this end, we consider using Precise Event Based Sampling (PEBS) equipped in Intel CPUs at higher sampling rates than used normally is promising. It saves context information (e.g., the general purpose registers) at occurrences of various hardware events such as cache misses. The information can be used to associate performance anomalies caused by system noise with specific messages. One challenge is that quantitative analysis of PEBS overhead with high sampling rates has not yet been studied. This is critical because high sampling rates can cause severe overhead but performance problems are often reproducible only in real environments. In this paper, we evaluate the overhead of PEBS and show: (1) every time PEBS saves context information, the target workload slows down by 200-300 ns due to the CPU overhead of PEBS, (2) the CPU overhead can be used to predict actual overhead incurred with complex workloads including multi-threaded ones with high accuracy, and (3) PEBS incurs cache pollution and extra memory IO since PEBS writes data into the CPU cache, and the severity of cache pollution is affected both by the sampling rate and the buffer size allocated for PEBS. To the best of our knowledge, we are the first to quantitatively analyze the overhead of PEBS.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130540289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 第七届超级计算机运行时和操作系统国际研讨会论文集ROSS 2017

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 Pub Date : 1900-01-01 DOI: 10.1145/3095770

引用次数: 0