{"title":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","authors":"","doi":"10.1145/3217189","DOIUrl":"https://doi.org/10.1145/3217189","url":null,"abstract":"","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124959332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Management, In-Situ Workflows and Extreme Scales","authors":"M. Parashar","doi":"10.1145/3217189.3217190","DOIUrl":"https://doi.org/10.1145/3217189.3217190","url":null,"abstract":"Data-related challenges are dominating computational and data-enabled sciences and are limiting the potential impact of scientific application workflows enabled by extreme scale computing environments. While data staging and in-situ/in-transit data processing have emerged as attractive approaches for supporting these extreme scale workflows, the increasing heterogeneity of the storage hierarchy, coupled with increasing data volumes and complex and dynamic data access/exchange patterns, are impacting the effectiveness of these techniques. In this talk I will discuss these challenges and explore how autonomic runtime techniques are being explored to address them. I will then present autonomic policies as well as cross layer mechanisms that are part of DataSpaces, an extreme scale data staging service. This research is part of the DataSpaces project at the Rutgers Discovery Informatics Institute.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"92 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116706223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is the Heap Manager Important to Many Cores?","authors":"Ye Liu, S. Kato, M. Edahiro","doi":"10.1145/3217189.3217194","DOIUrl":"https://doi.org/10.1145/3217189.3217194","url":null,"abstract":"The scalability problem, which presents that the performance of a multi-threaded program keeps constant or is degraded as more threads are involved when running on many-core processors, still poses challenges to OS designers and application programmers. Previous research work has demonstrated that removing bottlenecks associated with synchronization and making tasks equally distributed across processing cores from the perspective of OS designers and application programmers respectively, are beneficial to solve the scalability problem. However, as shown in this paper, our analysis on the heap manager indicates that researchers should pay attention to techniques of explicit memory management (i.e., malloc and free) on many cores as well. We have evaluated three popular heap managers including Ptmalloc, Hoard and Jemalloc using multi-threaded programs from the PARSEC benchmark suite on emerging tiled many-core processors. The experimental results exhibit that a well-designed scalable heap manager is important to the program performance and all evaluated heap managers have the chance to reduce the performance for some circumstances.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115575361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"rmalloc() and rpipe(): a uGNI-based Distributed Remote Memory Allocator and Access Library for One-sided Messaging","authors":"U. Wickramasinghe, A. Lumsdaine","doi":"10.1145/3217189.3217191","DOIUrl":"https://doi.org/10.1145/3217189.3217191","url":null,"abstract":"Optimizing communication is essential for high-performance computing because synchronization bottlenecks inhibit the overall performance and scalability of parallel applications. Today's cutting-edge computing hardware, as well as networking interfaces like Cray Aries/Gemini, features extremely low latency and high bandwidth remote memory access (RMA) operations for optimized data movement. However for any efficient data movement to occur between two logical processing units, software substrates must be able to properly exploit hardware resources for the underlying fabric. Overheads due to coarse granular synchronization and stalls during irregular access of remote memory regions may hint at two adverse effects of resource under-utilization in time and space. We introduce a uGNI-based distributed remote memory allocator called \"rmalloc\" which expands RDMA-enabled memory utilization, and a communication substrate called \"rpipe\" that tries to mitigate synchronization bottlenecks. Our UNIX-inspired RMA programming model is simple to use and equally applicable to both higher-level applications as well as lower-level runtime systems for enabling efficient data movement. Our micro-benchmark results suggest that \"rmalloc\" default next-fit allocator outperforms MPI-3.0 RMA by 1.5X and up to 6X in most cases, while other variants of \"rmalloc\" (i.e. best-fit, worst-fit) reduce external fragmentation and perform comparably or better than the default \"rmalloc\" allocator for irregular RMA.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126022874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Zhu, Teng Wang, K. Mohror, A. Moody, Kento Sato, Muhib Khan, Weikuan Yu
{"title":"Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support","authors":"Yue Zhu, Teng Wang, K. Mohror, A. Moody, Kento Sato, Muhib Khan, Weikuan Yu","doi":"10.1145/3217189.3217195","DOIUrl":"https://doi.org/10.1145/3217189.3217195","url":null,"abstract":"Developing a file system is a challenging task, especially a kernel-level file system. User-level file systems alleviate the burden and development complexity associated with kernel-level implementations. The Filesystem in Userspace (FUSE) is a widely used tool that allows non-privileged users to develop file systems in user space. When a FUSE file system is mounted, it runs as a user-level process. Application programs and FUSE file system processes are bridged through FUSE kernel module. However, as the FUSE kernel module transfers requests between an application program and a file system process, the overheads in a FUSE file system call from crossing the user-kernel boundary is non-trivial. The overheads contain user-kernel mode switches, context switches, and additional memory copies. In this paper, we describe our Direct-FUSE framework that supports multiple FUSE file systems as well as other, custom user-level file systems in user space without the need to cross the user/kernel boundary into the FUSE kernel module. All layers of Direct-FUSE are in user space, and applications can directly use pre-defined unified file system calls to interact with different user-defined file systems. Our performance results show that Direct-FUSE can outperform some native FUSE file systems by 11.9% on average and does not add significant overhead over backend file systems.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":" 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132158609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timo Hönig, C. Eibel, Adam Wagenhäuser, Maximilian Wagner, Wolfgang Schröder-Preikschat
{"title":"How to Make Profit: Exploiting Fluctuating Electricity Prices with Albatross, A Runtime System for Heterogeneous HPC Clusters","authors":"Timo Hönig, C. Eibel, Adam Wagenhäuser, Maximilian Wagner, Wolfgang Schröder-Preikschat","doi":"10.1145/3217189.3217193","DOIUrl":"https://doi.org/10.1145/3217189.3217193","url":null,"abstract":"The ongoing evolution of the power grid towards a highly dynamic supply system poses challenges as renewables induce new grid characteristics. The volatility of electricity sources leads to a fluctuating electricity price, which even becomes negative when excess supply occurs. Operators of high-performance--computing (HPC) clusters therefore can consider the highly dynamic variations of electricity prices to provide an energy-efficient and economic operation. This paper presents Albatross, a runtime system for heterogeneous HPC clusters. To ensure an energy-efficient and economic processing of HPC workloads, our system exploits heterogeneity at the hardware level and considers dynamic electricity prices. We have implemented Albatross and evaluate it on a heterogeneous HPC cluster in our lab to show how the power demand of the cluster decreases when electricity prices are high (i.e., excess demand at the grid). When electricity prices are low or negative (i.e., excess supply to the grid), Albatross purposefully increases the workload and, thus, power demand of the HPC cluster---to make profit.","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122271470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Implementation of Fast memset() Using Hardware Accelerators","authors":"K. Pusukuri, R. Gardner, Jared C. Smolens","doi":"10.1145/3217189.3217192","DOIUrl":"https://doi.org/10.1145/3217189.3217192","url":null,"abstract":"","PeriodicalId":183802,"journal":{"name":"Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128768884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}