Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers最新文献

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers 第八届超级计算机运行时和操作系统国际研讨会论文集

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189

引用次数: 0

Data Management, In-Situ Workflows and Extreme Scales 数据管理，现场工作流程和极端规模

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189.3217190

M. Parashar

引用次数: 0

Is the Heap Manager Important to Many Cores? 堆管理器对多核重要吗?

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189.3217194

Ye Liu, S. Kato, M. Edahiro

{"title":"Is the Heap Manager Important to Many Cores?","authors":"Ye Liu, S. Kato, M. Edahiro","doi":"10.1145/3217189.3217194","DOIUrl":"https://doi.org/10.1145/3217189.3217194","url":null,"abstract":"The scalability problem, which presents that the performance of a multi-threaded program keeps constant or is degraded as more threads are involved when running on many-core processors, still poses challenges to OS designers and application programmers. Previous research work has demonstrated that removing bottlenecks associated with synchronization and making tasks equally distributed across processing cores from the perspective of OS designers and application programmers respectively, are beneficial to solve the scalability problem. However, as shown in this paper, our analysis on the heap manager indicates that researchers should pay attention to techniques of explicit memory management (i.e., malloc and free) on many cores as well. We have evaluated three popular heap managers including Ptmalloc, Hoard and Jemalloc using multi-threaded programs from the PARSEC benchmark suite on emerging tiled many-core processors. The experimental results exhibit that a well-designed scalable heap manager is important to the program performance and all evaluated heap managers have the chance to reduce the performance for some circumstances.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115575361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

rmalloc() and rpipe(): a uGNI-based Distributed Remote Memory Allocator and Access Library for One-sided Messaging rmalloc()和rpipe():基于ugni的分布式远程内存分配器和单向消息访问库

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189.3217191

U. Wickramasinghe, A. Lumsdaine

{"title":"rmalloc() and rpipe(): a uGNI-based Distributed Remote Memory Allocator and Access Library for One-sided Messaging","authors":"U. Wickramasinghe, A. Lumsdaine","doi":"10.1145/3217189.3217191","DOIUrl":"https://doi.org/10.1145/3217189.3217191","url":null,"abstract":"Optimizing communication is essential for high-performance computing because synchronization bottlenecks inhibit the overall performance and scalability of parallel applications. Today's cutting-edge computing hardware, as well as networking interfaces like Cray Aries/Gemini, features extremely low latency and high bandwidth remote memory access (RMA) operations for optimized data movement. However for any efficient data movement to occur between two logical processing units, software substrates must be able to properly exploit hardware resources for the underlying fabric. Overheads due to coarse granular synchronization and stalls during irregular access of remote memory regions may hint at two adverse effects of resource under-utilization in time and space. We introduce a uGNI-based distributed remote memory allocator called \"rmalloc\" which expands RDMA-enabled memory utilization, and a communication substrate called \"rpipe\" that tries to mitigate synchronization bottlenecks. Our UNIX-inspired RMA programming model is simple to use and equally applicable to both higher-level applications as well as lower-level runtime systems for enabling efficient data movement. Our micro-benchmark results suggest that \"rmalloc\" default next-fit allocator outperforms MPI-3.0 RMA by 1.5X and up to 6X in most cases, while other variants of \"rmalloc\" (i.e. best-fit, worst-fit) reduce external fragmentation and perform comparably or better than the default \"rmalloc\" allocator for irregular RMA.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126022874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Direct-FUSE:移除高性能FUSE文件系统支持的中间人

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189.3217195

Yue Zhu, Teng Wang, K. Mohror, A. Moody, Kento Sato, Muhib Khan, Weikuan Yu

{"title":"Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support","authors":"Yue Zhu, Teng Wang, K. Mohror, A. Moody, Kento Sato, Muhib Khan, Weikuan Yu","doi":"10.1145/3217189.3217195","DOIUrl":"https://doi.org/10.1145/3217189.3217195","url":null,"abstract":"Developing a file system is a challenging task, especially a kernel-level file system. User-level file systems alleviate the burden and development complexity associated with kernel-level implementations. The Filesystem in Userspace (FUSE) is a widely used tool that allows non-privileged users to develop file systems in user space. When a FUSE file system is mounted, it runs as a user-level process. Application programs and FUSE file system processes are bridged through FUSE kernel module. However, as the FUSE kernel module transfers requests between an application program and a file system process, the overheads in a FUSE file system call from crossing the user-kernel boundary is non-trivial. The overheads contain user-kernel mode switches, context switches, and additional memory copies. In this paper, we describe our Direct-FUSE framework that supports multiple FUSE file systems as well as other, custom user-level file systems in user space without the need to cross the user/kernel boundary into the FUSE kernel module. All layers of Direct-FUSE are in user space, and applications can directly use pre-defined unified file system calls to interact with different user-defined file systems. Our performance results show that Direct-FUSE can outperform some native FUSE file systems by 11.9% on average and does not add significant overhead over backend file systems.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":" 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132158609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

How to Make Profit: Exploiting Fluctuating Electricity Prices with Albatross, A Runtime System for Heterogeneous HPC Clusters 如何盈利:利用异构高性能计算集群运行时系统Albatross利用波动电价

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189.3217193

Timo Hönig, C. Eibel, Adam Wagenhäuser, Maximilian Wagner, Wolfgang Schröder-Preikschat

{"title":"How to Make Profit: Exploiting Fluctuating Electricity Prices with Albatross, A Runtime System for Heterogeneous HPC Clusters","authors":"Timo Hönig, C. Eibel, Adam Wagenhäuser, Maximilian Wagner, Wolfgang Schröder-Preikschat","doi":"10.1145/3217189.3217193","DOIUrl":"https://doi.org/10.1145/3217189.3217193","url":null,"abstract":"The ongoing evolution of the power grid towards a highly dynamic supply system poses challenges as renewables induce new grid characteristics. The volatility of electricity sources leads to a fluctuating electricity price, which even becomes negative when excess supply occurs. Operators of high-performance--computing (HPC) clusters therefore can consider the highly dynamic variations of electricity prices to provide an energy-efficient and economic operation. This paper presents Albatross, a runtime system for heterogeneous HPC clusters. To ensure an energy-efficient and economic processing of HPC workloads, our system exploits heterogeneity at the hardware level and considers dynamic electricity prices. We have implemented Albatross and evaluate it on a heterogeneous HPC cluster in our lab to show how the power demand of the cluster decreases when electricity prices are high (i.e., excess demand at the grid). When electricity prices are low or negative (i.e., excess supply to the grid), Albatross purposefully increases the workload and, thus, power demand of the HPC cluster---to make profit.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122271470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Implementation of Fast memset() Using Hardware Accelerators 使用硬件加速器实现快速memset()

Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers Pub Date : 2018-06-12 DOI: 10.1145/3217189.3217192

K. Pusukuri, R. Gardner, Jared C. Smolens

引用次数: 0