Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017最新文献

筛选
英文 中文
Jitter-Trace: a low-overhead OS noise tracing tool based on Linux Perf Jitter-Trace:一个基于Linux Perf的低开销操作系统噪声跟踪工具
N. Gonzalez, Alessandro Morari, Fabio Checconi
{"title":"Jitter-Trace: a low-overhead OS noise tracing tool based on Linux Perf","authors":"N. Gonzalez, Alessandro Morari, Fabio Checconi","doi":"10.1145/3095770.3095772","DOIUrl":"https://doi.org/10.1145/3095770.3095772","url":null,"abstract":"Operating System (OS) noise is a well-known phenomenon in which OS activities interfere with the execution of large-scale parallel applications. Due to OS noise, feature-rich software environments such as Linux can seriously affect scalability. Kernel tracing can be used to identify OS noise sources, but until recently it required substantial OS modifications. This paper presents Jitter-Trace, a low-overhead tool that identifies and quantifies jitter sources. Jitter-Trace calculates the jitter generated by each OS activity, providing a complete set of task profiles and histograms of OS noise. This data is essential to implement OS noise mitigation strategies and reduce its impact on scalability. Jitter-Trace leverages the tracing and profiling capabilities of Linux Perf, which is widely available in current Linux distributions. Perf is tightly integrated in the Linux kernel and features a lightweight implementation.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"2673 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127033604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
UNITY: Unified Memory and File Space UNITY:统一内存和文件空间
T. Jones, Michael J. Brim, Geoffroy R. Vallée, B. Mayer, A. Welch, Tonglin Li, M. Lang, Latchesar Ionkov, Douglas Otstott, Ada Gavrilovska, G. Eisenhauer, Thaleia Dimitra Doudali, Pradeep R. Fernando
{"title":"UNITY: Unified Memory and File Space","authors":"T. Jones, Michael J. Brim, Geoffroy R. Vallée, B. Mayer, A. Welch, Tonglin Li, M. Lang, Latchesar Ionkov, Douglas Otstott, Ada Gavrilovska, G. Eisenhauer, Thaleia Dimitra Doudali, Pradeep R. Fernando","doi":"10.1145/3095770.3095776","DOIUrl":"https://doi.org/10.1145/3095770.3095776","url":null,"abstract":"This paper describes the vision for UNITY, a new high-performance computing focused data storage abstraction that places the entire memory hierarchy, including both traditionally separated memory-and file-based data storage, into one storage continuum. Through the use of a novel API and a set of services centered around a smart runtime system, UNITY is able to provide a number of valuable and interesting benefits. The unified storage space provides a scalable and resilient data environment that dynamically manages the mapping of data onto available resources based on multiple factors, including desired persistence and energy budget considerations. By eliminating the need for high-performance computing domain scientists to develop architecture-dependent optimizations for rapidly evolving data storage technologies, UNITY addresses both ease-of-use and performance.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"90 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124162972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scheduling Chapel Tasks with Qthreads on Manycore: A Tale of Two Schedulers 在Manycore上使用Qthreads调度Chapel任务:两个调度程序的故事
N. Evans, Stephen L. Olivier, R. Barrett, George Stelle
{"title":"Scheduling Chapel Tasks with Qthreads on Manycore: A Tale of Two Schedulers","authors":"N. Evans, Stephen L. Olivier, R. Barrett, George Stelle","doi":"10.1145/3095770.3095774","DOIUrl":"https://doi.org/10.1145/3095770.3095774","url":null,"abstract":"This paper describes improvements in task scheduling for the Chapel parallel programming language provided in its default on-node tasking runtime, the Qthreads library. We describe a new scheduler distrib which builds on the approaches of two previous Qthreads schedulers, Sherwood and Nemesis, and combines the best aspects of both --work stealing and load balancing from Sherwood and a lock free queue access from Nemesis-- to make task queuing better suited for the use of Chapel in the manycore era. We demonstrate the efficacy of this new scheduler by showing improvements in various individual benchmarks of the Chapel test suite on the Intel Knights Landing architecture.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129785903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Toward Full Specialization of the HPC Software Stack: Reconciling Application Containers and Lightweight Multi-kernels 迈向高性能计算软件栈的完全专门化:协调应用程序容器和轻量级多内核
Balazs Gerofi, R. Riesen, R. Wisniewski, Y. Ishikawa
{"title":"Toward Full Specialization of the HPC Software Stack: Reconciling Application Containers and Lightweight Multi-kernels","authors":"Balazs Gerofi, R. Riesen, R. Wisniewski, Y. Ishikawa","doi":"10.1145/3095770.3095777","DOIUrl":"https://doi.org/10.1145/3095770.3095777","url":null,"abstract":"Application containers enable users to have greater control of their user-space execution environment by bundling application code with all the necessary libraries in a single software package. Lightweight multi-kernels leverage multi-core CPUs to run separate operating system (OS) kernels on different CPU cores, usually a lightweight kernel (LWK) and Linux. A multi-kernel's primary goal is attaining LWK scalability and performance in combination with support for the Linux APIs and environment. Both of these technologies are designed to address the increasing hardware complexity and the growing software diversity of High Performance Computing (HPC) systems. While containers enable specialization of user-space components, the LWK part of a multi-kernel system is also a form of software specialization, but targeting kernel space. This paper proposes a framework for combining application containers with multi-kernel operating systems thereby enabling specialization across the software stack. We provide an overview of the Linux container technologies and the challenges we faced to bring these two technologies together. Results from previous work show that multi-kernels can achieve better isolation than Linux. In this work, we deployed our framework on 1,024 Intel Xeon Phi Knights Landing nodes. We highlight two important results obtained from running at a larger scale. First, we show that containers impose zero runtime overhead even at scale. Second, by taking advantage of our integrated framework, we demonstrate that users can transparently benefit from lightweight multi-kernels, attaining identical speedups to the native multi-kernel execution.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116721842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Seastar: A Comprehensive Framework for Telemetry Data in HPC Environments Seastar: HPC环境中遥测数据的综合框架
Ole Weidner, A. Barker, M. Atkinson
{"title":"Seastar: A Comprehensive Framework for Telemetry Data in HPC Environments","authors":"Ole Weidner, A. Barker, M. Atkinson","doi":"10.1145/3095770.3095775","DOIUrl":"https://doi.org/10.1145/3095770.3095775","url":null,"abstract":"A large number of 2nd generation high-performance computing applications and services rely on adaptive and dynamic architectures and execution strategies to run efficiently, resiliently, and at scale on today's HPC infrastructures. They require information about applications and their environment to steer and optimize execution. We define this information as telemetry data. Current HPC platforms do not provide the infrastructure, interfaces and conceptual models to collect, store, analyze, and access such data. Today, applications depend on application and platform specific techniques for collecting telemetry data; introducing significant development overheads that inhibit portability and mobility. The development and adoption of adaptive, context-aware strategies is thereby impaired. To facilitate 2nd generation applications, more efficient application development, and swift adoption of adaptive applications in production, a comprehensive framework for telemetry data management must be provided by future HPC systems and services. We introduce Seastar, a conceptual model and a software framework to collect, store, analyze, and exploit streams of telemetry data generated by HPC systems and their applications. We show how Seastar can be integrated with HPC platform architectures and how it enables common application execution strategies.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126781709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Operating and Runtime Systems Challenges for HPC Systems HPC系统的操作和运行时系统挑战
A. Maccabe
{"title":"Operating and Runtime Systems Challenges for HPC Systems","authors":"A. Maccabe","doi":"10.1145/3095770.3095771","DOIUrl":"https://doi.org/10.1145/3095770.3095771","url":null,"abstract":"Future HPC systems will be characterized by extreme heterogeneity. We will see increasing heterogeneity in virtually every aspect of node architecture from computational engines to memory systems. We will see increasing heterogeneity in applications, including heterogeneity within applications (as previously independent applications are composed to build new applications). We will see increasing heterogeneity in system usage models; in some cases, the HPC system is not the most precious resource being managed. We will also see increasing heterogeneity in the shared services (e.g., storage and visualization systems) that are connected to HPC systems. All of this increasing heterogeneity is certain to create new challenges in the design and implementation of operating and runtime systems. There will be new kinds of resources to manage and many resource management tactics will be invented (and some re-discovered and adapted) to address the new heterogeneity. In essence, we will tacitly agree that the operating and runtime systems need to adapt to enable the inevitable integration of new technologies, applications, usage models, and shared services. While this agreement is critical for our ability to make incremental progress, we, as a community, must step back and ask the relevant question: Does the OS or runtime system bear the brunt of the adaptation, or will we be able to insist on changes in the technologies, applications, and environment? In the past decade, we have seen a similar tradeoff play out between the application teams and the architects of computational engines: how much floating point precision is required and how is this precision implemented? How can we define similar tradeoffs that are important in the design and implementation of operating and runtime systems?","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121808004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes 非对称性能对基于异步任务运行时的影响
D. Ganguly, J. Lange
{"title":"The Effect of Asymmetric Performance on Asynchronous Task Based Runtimes","authors":"D. Ganguly, J. Lange","doi":"10.1145/3095770.3095778","DOIUrl":"https://doi.org/10.1145/3095770.3095778","url":null,"abstract":"It is generally accepted that future supercomputing workloads will consist of application compositions made up of coupled simulations as well as in-situ analytics. While these components have commonly been deployed using a space-shared configuration to minimize cross-workload interference, it is likely that not all the workload components will require the full processing capacity of the CPU cores they are running on. For instance, an analytics workload often does not need to run continuously and is not generally considered to have the same priority as simulation codes. In a space-shared configuration, this arrangement would lead to wasted resources due to periodically idle CPUs, which are generally unusable by traditional bulk synchronous parallel (BSP) applications. As a result, many have started to reconsider task based runtimes owing to their ability to dynamically utilize available CPU resources. While the dynamic behavior of task-based runtimes had historically been targeted at application induced load imbalances, the same basic situation arises due to the asymmetric performance resulting from time sharing a CPU with other workloads. Many have assumed that task based runtimes would be able to adapt easily to these new environments without significant modifications. In this paper, we present a preliminary set of experiments that measured how well asynchronous task-based runtimes are able to respond to load imbalances caused by the asymmetric performance of time shared CPUs. Our work focuses on a set of experiments using benchmarks running on both Charm++ and HPX-5 in the presence of a competing workload. The results show that while these runtimes are better suited at handling the scenarios than traditional runtimes, they are not yet capable of effectively addressing anything other than a fairly minimal level of CPU contention.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117287802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis 用于在线系统噪声分析的Intel PEBS开销的定量评估
Soramichi Akiyama, Takahiro Hirofuchi
{"title":"Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis","authors":"Soramichi Akiyama, Takahiro Hirofuchi","doi":"10.1145/3095770.3095773","DOIUrl":"https://doi.org/10.1145/3095770.3095773","url":null,"abstract":"Analyzing system-noise incurred to high-throughput systems (e.g., Spark, RDBMS) from the underlying machines must be in the granularity of the message- or request-level to find the root causes of performance anomalies, because messages are passed through many components in very short periods. To this end, we consider using Precise Event Based Sampling (PEBS) equipped in Intel CPUs at higher sampling rates than used normally is promising. It saves context information (e.g., the general purpose registers) at occurrences of various hardware events such as cache misses. The information can be used to associate performance anomalies caused by system noise with specific messages. One challenge is that quantitative analysis of PEBS overhead with high sampling rates has not yet been studied. This is critical because high sampling rates can cause severe overhead but performance problems are often reproducible only in real environments. In this paper, we evaluate the overhead of PEBS and show: (1) every time PEBS saves context information, the target workload slows down by 200-300 ns due to the CPU overhead of PEBS, (2) the CPU overhead can be used to predict actual overhead incurred with complex workloads including multi-threaded ones with high accuracy, and (3) PEBS incurs cache pollution and extra memory IO since PEBS writes data into the CPU cache, and the severity of cache pollution is affected both by the sampling rate and the buffer size allocated for PEBS. To the best of our knowledge, we are the first to quantitatively analyze the overhead of PEBS.","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130540289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 第七届超级计算机运行时和操作系统国际研讨会论文集ROSS 2017
{"title":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","authors":"","doi":"10.1145/3095770","DOIUrl":"https://doi.org/10.1145/3095770","url":null,"abstract":"","PeriodicalId":205790,"journal":{"name":"Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114966610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信