ROSS@ICS最新文献

Reduction of operating system jitter caused by page reclaim 减少由页面回收引起的操作系统抖动

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612270

Y. Oyama, Shun Ishiguro, J. Murakami, Shin Sasaki, R. Matsumiya, O. Tatebe

{"title":"Reduction of operating system jitter caused by page reclaim","authors":"Y. Oyama, Shun Ishiguro, J. Murakami, Shin Sasaki, R. Matsumiya, O. Tatebe","doi":"10.1145/2612262.2612270","DOIUrl":"https://doi.org/10.1145/2612262.2612270","url":null,"abstract":"Operating system jitter is one of the major causes of runtime overhead in applications of high performance computing. Jitter results from the execution of services by the operating system kernel, such as interrupt handling and tasklets, or the execution of various daemon processes developed in order to provide operating system services, such as memory management daemons. This execution interrupts application computations and increases their execution time. Jitter significantly affects applications where many processes or threads frequently synchronize with each other. In this paper, we investigate the impact of jitter caused by reclaiming memory pages, and propose a method for reducing the impact. The target operating system is Linux. When the Linux kernel runs out of memory, the kernel awakens a special kernel thread to reclaim memory pages that are unlikely to be used in the near future. If the kernel thread is frequently awakened, application performance is degraded because of its resource consumption. The proposed method can reclaim memory pages in advance of the kernel thread. It reclaims more pages at one time than the kernel thread, thus reducing the frequency of page reclaim and the impact of jitter. We implement a system based on the proposed method and conduct an experiment using practical weather forecast software. Results of the experiment show that the proposed method minimizes performance degradation caused by jitter.","PeriodicalId":216902,"journal":{"name":"ROSS@ICS","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130392822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Overhead of a decentralized gossip algorithm on the performance of HPC applications 分散式八卦算法对高性能计算应用性能的影响

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612271

Ely Levy, A. Barak, A. Shiloh, Matthias Lieber, C. Weinhold, Hermann Härtig

引用次数: 6

VMM emulation of Intel hardware transactional memory 英特尔硬件事务性内存的VMM仿真

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612265

Maciej Swiech, Kyle C. Hale, P. Dinda

引用次数: 2

Hybrid MPI: a case study on the Xeon Phi platform 混合MPI: Xeon Phi平台的案例研究

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612267

U. Wickramasinghe, G. Bronevetsky, A. Lumsdaine, A. Friedley

引用次数: 5

Automatic SMT threading for OpenMP applications on the Intel Xeon Phi co-processor Intel Xeon Phi协处理器上OpenMP应用程序的自动SMT线程

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612268

W. Heirman, Trevor E. Carlson, K. V. Craeynest, I. Hur, A. Jaleel, L. Eeckhout

引用次数: 10

Revisiting virtual memory for high performance computing on manycore architectures: a hybrid segmentation kernel approach 在多核架构上为高性能计算重新访问虚拟内存:一种混合分段核方法

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612264

Yuki Soma, Balazs Gerofi, Y. Ishikawa

{"title":"Revisiting virtual memory for high performance computing on manycore architectures: a hybrid segmentation kernel approach","authors":"Yuki Soma, Balazs Gerofi, Y. Ishikawa","doi":"10.1145/2612262.2612264","DOIUrl":"https://doi.org/10.1145/2612262.2612264","url":null,"abstract":"Page-based memory management (paging) is utilized by most of the current operating systems (OSs) due to its rich features such as prevention of memory fragmentation and fine-grained access control. Page-based virtual memory, however, stores virtual to physical mappings in page tables that also reside in main memory. Because translating virtual to physical addresses requires walking the page tables, which in turn implies additional memory accesses, modern CPUs employ translation lookaside buffers (TLBs) to cache the mappings. Nevertheless, TLBs are limited in size and applications that consume a large amount of memory and exhibit little or no locality in their memory access patterns, such as graph algorithms, suffer from the high overhead of TLB misses.\u0000 This paper proposes a new hybrid kernel design targeting many-core CPUs, which manages the application's memory space by segmentation and offloads kernel services to dedicated CPU cores where paging is utilized. The method enables applications to run on top of the low-cost segmented memory management while allows the kernel to use the rich features of paging. We present the design and implementation of our kernel and demonstrate that segmentation can provide superior performance compared to both regular and large page based virtual memory. For example, running Graph500 on top of our segmentation design over Intel's Xeon Phi chip can yield up to 81% and 9% improvement compared to utilizing 4kB and 2MB pages in MPSS Linux, respectively.","PeriodicalId":216902,"journal":{"name":"ROSS@ICS","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129130420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

mOS: an architecture for extreme-scale operating systems mOS:用于超大规模操作系统的架构

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612263

R. Wisniewski, T. Inglett, Pardo Keppel, Ravi Murty, R. Riesen

{"title":"mOS: an architecture for extreme-scale operating systems","authors":"R. Wisniewski, T. Inglett, Pardo Keppel, Ravi Murty, R. Riesen","doi":"10.1145/2612262.2612263","DOIUrl":"https://doi.org/10.1145/2612262.2612263","url":null,"abstract":"Linux®, or more specifically, the Linux API, plays a key role in HPC computing. Even for extreme-scale computing, a known and familiar API is required for production machines. However, an off-the-shelf Linux distribution faces challenges at extreme scale. To date, two approaches have been used to address the challenges of providing an operating system (OS) at extreme scale. In the Full-Weight Kernel (FWK) approach, an OS, typically Linux, forms the starting point, and work is undertaken to remove features from the environment so that it will scale up across more cores and out across a large cluster. A Light-Weight Kernel (LWK) approach often starts with a new kernel and work is undertaken to add functionality to provide a familiar API, typically Linux. Either approach however, results in an execution environment that is not fully Linux compatible.\u0000 mOS (multi Operating System) runs both an FWK (Linux), and an LWK, simultaneously as kernels on the same compute node. mOS thereby achieves the scalability and reliability of LWKs, while providing the full Linux functionality of an FWK. Further, mOS works in concert with Operating System Nodes (OSNs) to offload system calls, e.g., I/O, that are too invasive to run on the compute nodes at extreme-scale. Beyond providing full Linux capability with LWK performance, other advantages of mOS include the ability to effectively manage different types of compute and memory resources, interface easily with proposed asynchronous and fine-grained runtimes, and nimbly manage new technologies.\u0000 This paper is an architectural description of mOS. As a prototype is not yet finished, the contributions of this work are a description of mOS's architecture, an exploration of the tradeoffs and value of this approach for the purposes listed above, and a detailed architecture description of each of the six components of mOS, including the tradeoffs we considered. The uptick of OS research work indicates that many view this as an important area for getting to extreme scale. Thus, most importantly, the goal of the paper is to generate discussion in this area at the workshop.","PeriodicalId":216902,"journal":{"name":"ROSS@ICS","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126007180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

An evaluation of BitTorrent's performance in HPC environments BitTorrent在高性能计算环境下的性能评估

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612269

Matthew G. F. Dosanjh, P. Bridges, S. M. Kelly, J. Laros, C. Vaughan

引用次数: 2

Building blocks for an exa-scale operating system 超大规模操作系统的构建块

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2627355

Hermann Härtig

引用次数: 0

PICS: a performance-analysis-based introspective control system to steer parallel applications PICS:一种基于性能分析的内省控制系统，用于引导并行应用

ROSS@ICS Pub Date : 2014-06-10 DOI: 10.1145/2612262.2612266

Yanhua Sun, J. Lifflander, L. Kalé

{"title":"PICS: a performance-analysis-based introspective control system to steer parallel applications","authors":"Yanhua Sun, J. Lifflander, L. Kalé","doi":"10.1145/2612262.2612266","DOIUrl":"https://doi.org/10.1145/2612262.2612266","url":null,"abstract":"Parallel programming has always been difficult due to the complexity of hardware and the diversity of applications. Although significant progress has been achieved with the remarkable efforts of researchers in academia and industry, attaining high parallel efficiency on large supercomputers with millions of cores for various applications remains challenging. Therefore, performance tuning has become even more important and challenging than ever before. In this paper, we describe the design and implementation of PICS: Performance-analysis-based Introspective Control System, which is used to tune parallel programs. PICS provides a generic set of abstractions to the applications to expose the application-specific knowledge to the runtime system. The abstractions are called control points, which are tunable parameters affecting application performance. The application behaviors are observed, measured and automatically analyzed by the PICS. Based on the analysis results and expert knowledge rules, program characteristics are extracted to assist the search for optimal configurations of the control points. We have implemented the PICS control system in Charm++, an asynchronous message-driven parallel programming model. We demonstrate the utility of PICS with several benchmarks and a real-world application and show its effectiveness.","PeriodicalId":216902,"journal":{"name":"ROSS@ICS","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115221091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14