International Workshop on Runtime and Operating Systems for Supercomputers最新文献

Design and implementation of a customizable work stealing scheduler 设计和实现一个可定制的工作窃取调度程序

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481433

Jun Nakashima, Sho Nakatani, K. Taura

{"title":"Design and implementation of a customizable work stealing scheduler","authors":"Jun Nakashima, Sho Nakatani, K. Taura","doi":"10.1145/2491661.2481433","DOIUrl":"https://doi.org/10.1145/2491661.2481433","url":null,"abstract":"An efficient scheduler is important for task parallelism. It should provide scalable dynamic load-balancing mechanism among CPU cores. To meet this requirement, most runtime systems for task parallelism use work stealing as scheduling strategy. Work stealing schedulers typically steal work randomly. This strategy does not consider hardware specific knowledge such as memory hierarchy or application specific knowledge such as cache usage. In order to execute tasks more efficiently, work stealing schedulers should take such knowledge into account. To this end, we propose an API that can customize scheduling strategies and take hardware and application specific knowledge into account while preserving the desirable properties of work stealing.\u0000 This paper describes the design of our proposed API. Specifically, it provides mechanisms to give scheduling hints for tasks and to implement user-defined work stealing functions. They enable programmers to implement a work stealing strategy optimized for their applications. This paper also presents preliminary evaluation results of the proposed API. A kernel of STREAM microbenchmark improved by 58.8% with a work stealing strategy utilizing data cached by the previous iteration. Performance of matrix multiply improved by 18.2% on 32 AMD cores by a work stealing strategy that tries to steal as a coarse grained task as possible.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125611543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Enabling accurate power profiling of HPC applications on exascale systems 在百亿亿级系统上实现HPC应用程序的精确功率分析

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481429

Gokcen Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie

{"title":"Enabling accurate power profiling of HPC applications on exascale systems","authors":"Gokcen Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie","doi":"10.1145/2491661.2481429","DOIUrl":"https://doi.org/10.1145/2491661.2481429","url":null,"abstract":"Despite being one of the most important limiting factors on the road to exascale computing, power is not yet considered a \"first-class citizen\" among the system resources. As a result, there is no clear OS interface that exposes accurate resource power consumption to user-level runtimes that implement power-aware software algorithms.\u0000 In this work we propose a System Monitor Interface (SMI) between the OS and the user runtime that exposes accurate, per-core power consumption. To make up for the lack of reliable per-core power sensors, we implement a proxy power sensor, based on a regression analysis of core activity, that provides per-core information. SMI effectively hides the implementation details from the user, who has the perception of reading power information from a real sensor. This allows us these proxy sensors to be replaced with real hardware sensors when the latter becomes available, without the need to modify user-level software.\u0000 Using SMI and the proxy power sensors, we implement a power profiling runtime library and analyzed applications from the NPB benchmark suite and the Exascale Co-Design Centers. Our results show that accurate, per-core power information is necessary for the development of exascale system software and for comprehensively understanding the power characteristics of parallel scientific applications.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129487121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R 霍布斯:组合和虚拟化是超大规模OS/R的基础

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481427

R. Brightwell, R. Oldfield, A. Maccabe, D. Bernholdt

引用次数: 58

A gossip-based approach to exascale system services 基于八卦的百亿亿级系统服务方法

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481428

Philip Soltero, P. Bridges, D. Arnold, M. Lang

引用次数: 15

Data deduplication in a hybrid architecture for improving write performance 混合架构下的重复数据删除，提高写性能

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481435

Chao Chen, Jonathan Bastnagel, Yong Chen

{"title":"Data deduplication in a hybrid architecture for improving write performance","authors":"Chao Chen, Jonathan Bastnagel, Yong Chen","doi":"10.1145/2491661.2481435","DOIUrl":"https://doi.org/10.1145/2491661.2481435","url":null,"abstract":"Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122541145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Characteristics of adaptive runtime systems in HPC 高性能计算中自适应运行时系统的特点

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2481425.2481426

L. Kalé

引用次数: 1

Evaluating the feasibility of using memory content similarity to improve system resilience 评估使用内存内容相似度来提高系统弹性的可行性

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481432

Scott Levy, P. Bridges, Kurt B. Ferreira, A. Thompson, C. Trott

引用次数: 7

Transparently consistent asynchronous shared memory 透明一致的异步共享内存

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481431

Hakan Akkan, Latchesar Ionkov, M. Lang

{"title":"Transparently consistent asynchronous shared memory","authors":"Hakan Akkan, Latchesar Ionkov, M. Lang","doi":"10.1145/2491661.2481431","DOIUrl":"https://doi.org/10.1145/2491661.2481431","url":null,"abstract":"The advent of many-core processors is imposing many changes on the operating system. The resources that are under contention have changed; previously, CPU cycles were the resource in demand and required fair and precise sharing. Now compute cycles are plentiful, but the memory per core is decreasing. In the past, scientific applications used all the CPU cores to finish as fast as possible, with visualization and analysis of the data performed after the simulation finished. With decreasing memory available per core, as well as the higher price (in power and time) for storing data on disk or sending it over the network, it now makes sense to run visualization and analytics applications in-situ, while the application is running. Visualization and analytics applications then need to sample the simulation memory with as little interference and as little changes in the simulation code as possible.\u0000 We propose an asynchronous memory sharing facility that allows consistent states of the memory to be shared between processes without any implicit or explicit synchronization. We distinguish two types of processes; a single producer and one or more observers. The producer modifies the state of the data, making available consistent versions of the state to any observer. The observers, working at different sampling rates, can access the latest available consistent state.\u0000 Some applications that would benefit from this type of facility include check-pointing applications, processes monitoring, unobtrusive process debugging, and the sharing of data for visualization or analytics. To evaluate our ideas we have developed two kernel-level implementations for sharing data asynchronously and we compared these implementations to a traditional user-space synchronized multi-buffer method.\u0000 We have seen improvements of up to 3.5x in our tests over the traditional multi-buffer method with 20% of the data pages touched.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128574827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

An early prototype of an autonomic performance environment for exascale exascale自主性能环境的早期原型

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481434

K. Huck, S. Shende, A. Malony, Hartmut Kaiser, Allan Porterfield, R. Fowler, R. Brightwell

{"title":"An early prototype of an autonomic performance environment for exascale","authors":"K. Huck, S. Shende, A. Malony, Hartmut Kaiser, Allan Porterfield, R. Fowler, R. Brightwell","doi":"10.1145/2491661.2481434","DOIUrl":"https://doi.org/10.1145/2491661.2481434","url":null,"abstract":"Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128737256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A file I/O system for many-core based clusters 用于多核集群的文件I/O系统

International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2012-06-29 DOI: 10.1145/2318916.2318920

Yuki Matsuo, Taku Shimosawa, Y. Ishikawa

{"title":"A file I/O system for many-core based clusters","authors":"Yuki Matsuo, Taku Shimosawa, Y. Ishikawa","doi":"10.1145/2318916.2318920","DOIUrl":"https://doi.org/10.1145/2318916.2318920","url":null,"abstract":"A many-core based co-processor, such as the Intel Many Integrated Core (MIC) Architecture, connected to a server-level multi-core host processor via a PCI Express bus, has recently been the subject of a great deal of attention. In such a machine, because the many-core is separated from the host processor with disk I/O and it also has limited cache and memory bandwidth, performance degradation can results from cache pollution and data transfer latency caused by processing file operations.\u0000 Three types of file I/O mechanisms for the many-core in such a system are designed, implemented, and evaluated in this paper. One mechanism involves the file I/O system calls being performed by the kernel running on the same core that the application program is running on. Another is a mechanism whereby those system calls are offloaded to the kernel running on a dedicated core of the many-core that handles file I/O operations. In either case, the kernel requests file data transfer to the file system on the host processor and file data is cached on the many-core. The third mechanism involves the system calls being offloaded to the kernel running on the host processor so that the host kernel transfers data directly to the user buffer in the many-core.\u0000 The experimental results show that the first two mechanisms, performing in the many-core, are superior to offloading them to the host when the data size is relatively small because they are designed to conduct file I/O operations through a file cache and fewer of communications occur between the many-core and the host. With larger data sizes, however, file I/O system calls offloaded to the host, which transfer data directly to/from the user buffer, are better than those performed inside the many-core. In view of cache awareness, it is shown that the user code and part of the file I/O system calls can be performed efficiently when the user buffer data is small enough to be on the cache.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115247650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9