Proceedings of the 2016 International Conference on Supercomputing最新文献

筛选

英文中文

Fairness-oriented OS Scheduling Support for Multicore Systems 面向公平的多核系统调度支持

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI: 10.1145/2925426.2926262

Changdae Kim, Jaehyuk Huh

{"title":"Fairness-oriented OS Scheduling Support for Multicore Systems","authors":"Changdae Kim, Jaehyuk Huh","doi":"10.1145/2925426.2926262","DOIUrl":"https://doi.org/10.1145/2925426.2926262","url":null,"abstract":"Although traditional CPU scheduling efficiently utilizes multiple cores with equal computing capacity, the advent of multicores with diverse capabilities pose challenges to CPU scheduling. For the multi-cores with uneven computing capability, scheduling is essential to exploit the efficiency of core asymmetry, by matching each application with the best core type. However, in addition to the efficiency, an important aspect of CPU scheduling is fairness in CPU provisioning. Such uneven core capability is inherently unfair to threads and causes performance variance, as applications running on fast cores receive higher capability than applications on slow cores. Depending on co-running applications and scheduling decisions, the performance of an application may vary significantly. This study investigates the fairness problem in multi-cores with uneven capability, and explores the design space of OS schedulers supporting multiple fairness constraints. In this paper, we consider two fairness-oriented constraints, minimum fairness for the minimum guaranteed performance and uniformity for performance variation reduction. This study proposes three scheduling policies which guarantee a minimum performance bound while improving the overall throughput and reducing performance variation too. The three proposed fairness-oriented schedulers are implemented for the Linux kernel with an online application monitoring technique. Using an emulated asymmetric multi-core with frequency scaling and a real asymmetric multi-core with the big.LITTLE architecture, the paper shows that the proposed schedulers can effectively support the specified fairness while improving overall system throughput.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127361882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication Lynx:使用操作系统和硬件支持快速细粒度核间通信

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-03-24 DOI: 10.1145/2925426.2926274

Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang, Timothy M. Jones

引用次数: 11

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing BLASX:面向异构多gpu计算的高性能三级BLAS库

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2015-10-16 DOI: 10.1145/2925426.2926256

Linnan Wang, Wei Wu, Jianxiong Xiao, Yezhou Yang

{"title":"BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing","authors":"Linnan Wang, Wei Wu, Jianxiong Xiao, Yezhou Yang","doi":"10.1145/2925426.2926256","DOIUrl":"https://doi.org/10.1145/2925426.2926256","url":null,"abstract":"Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we investigate existing multi-GPU level 3 BLAS and present that 1) issues, such as the improper load balancing, inefficient communication, insufficient GPU stream level concurrency and data caching, impede current implementations from fully harnessing heterogeneous computing resources; 2) and the inter-GPU Peer-to-Peer(P2P) communication remains unexplored. We then present BLASX: a highly optimized multi-GPU level-3 BLAS. We adopt the concepts of algorithms-by-tiles treating a matrix tile as the basic data unit and operations on tiles as the basic task. Tasks are guided with a dynamic asynchronous runtime, which is cache and locality aware. The communication cost under BLASX becomes trivial as it perfectly overlaps communication and computation across multiple streams during asynchronous task progression. It also takes the current tile cache scheme one step further by proposing an innovative 2-level hierarchical tile cache, taking advantage of inter-GPU P2P communication. As a result, linear speedup is observable with BLASX under multi-GPU configurations; and the extensive benchmarks demonstrate that BLASX consistently outperforms the related leading industrial and academic implementations such as cuBLAS-XT, SuperMatrix, MAGMA.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129795970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

首页上一页