{"title":"Sequential Task Flow Runtime Model Improvements and Limitations","authors":"Yu Pei, G. Bosilca, J. Dongarra","doi":"10.1109/ROSS56639.2022.00009","DOIUrl":"https://doi.org/10.1109/ROSS56639.2022.00009","url":null,"abstract":"The sequential task flow (STF) model is the main-stream approach for interacting with task-based runtime systems, with StarPU and the Dynamic task discovery (DTD) in PaRSEC being two implementations of this model. Compared with other approaches of submitting tasks into a runtime system, STF has interesting advantages centered around an easy-to-use API, that allows users to expressed algorithms as a sequence of tasks (much like in OpenMP), while allowing the runtime to automatically identify and analyze the task dependencies and scheduling. In this paper, we focus on the DTD interface in PaRSEC, highlight some of its lesser known limitations and implemented two optimization techniques for DTD: support for user level graph trimming, and a new API for broadcast read-only data to remote tasks. We then analyze the benefits and limitations of these optimizations with benchmarks as well as on two common matrix factorization kernels Cholesky and QR, on two different systems Shaheen II from KAUST and Fugaku from RIKEN. We point out some potential for further improvements, and provided valuable insights into the strength and weakness of STF model. hoping to guide the future developments of task-based runtime systems.","PeriodicalId":226739,"journal":{"name":"2022 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123213460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Porting the Kitten Lightweight Kernel Operating System to RISC-V","authors":"Nicholas Gordon, K. Pedretti, J. Lange","doi":"10.1109/ROSS56639.2022.00008","DOIUrl":"https://doi.org/10.1109/ROSS56639.2022.00008","url":null,"abstract":"Hardware design in high-performance computing (HPC) is often highly experimental. Exploring new designs is difficult and time-consuming, requiring lengthy vendor cooperation. RISC-V is an open-source processor ISA that improves the accessibility of chip design, including the ability to do hardware/software co-design using open-source hardware and tools. Co-design allows design decisions to easily flow across the hardware/software boundary and influence future design ideas. However, new hardware designs require corresponding software to drive and test them. Conventional operating systems like Linux are massively complex and modification is time-prohibitive. In this paper, we describe our port of the Kitten lightweight kernel operating system to RISC-V in order to provide an alternative to Linux for conducting co-design research. Kitten's small code base and simple resource management policies are well matched for quickly exploring new hardware ideas that may require radical operating system modifications and restructuring. Our evaluation shows that Kitten on RISC-V is functional and provides similar performance to Linux for single-core benchmarks. This provides a solid foundation for using Kitten in future co-design research involving RISC-V.","PeriodicalId":226739,"journal":{"name":"2022 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125556231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Efficient Oversubscription: On the Cost and Benefit of Event-Based Communication in MPI","authors":"Jan Bierbaum, Maksym Planeta, Hermann Härtig","doi":"10.1109/ROSS56639.2022.00007","DOIUrl":"https://doi.org/10.1109/ROSS56639.2022.00007","url":null,"abstract":"Contemporary HPC systems use batch scheduling of compute jobs running on exclusively assigned hardware resources. During communication, polling for progress is the state of the art as it promises minimal latency. Previous work on oversubscription and event-based communication, i.e. vacating the CPU while waiting for communication to finish, shows that these techniques can improve the overall system utilisation and reduce the energy consumption. Despite these findings, neither of the two techniques is commonly used in HPC systems today. We believe that the current lack of detailed studies of the low-level effects of event-based communication, a key enabler of efficient oversubscription for classical MPI-based applications, is a major obstacle to a wider adoption. We demonstrate that the sched_yield system call, which is often used for oversubscription scenarios, is not best suited for this purpose on modern Linux systems. Furthermore, we incorporate event-based communication into Open MPI and evaluate the effects on latency and energy consumption using an MPI micro-benchmark. Our results indicate that event-base communication incurs significant latency overhead but also saves energy. Both effects grow with the imbalance of the application using MPI.","PeriodicalId":226739,"journal":{"name":"2022 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121544182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Half Title Page","authors":"","doi":"10.1109/ross56639.2022.00001","DOIUrl":"https://doi.org/10.1109/ross56639.2022.00001","url":null,"abstract":"","PeriodicalId":226739,"journal":{"name":"2022 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134001216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ROSS 2022 Workshop Organization","authors":"","doi":"10.1109/ross56639.2022.00005","DOIUrl":"https://doi.org/10.1109/ross56639.2022.00005","url":null,"abstract":"","PeriodicalId":226739,"journal":{"name":"2022 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114698556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}