Proceedings of the 2016 International Conference on Supercomputing最新文献

筛选
英文 中文
Fairness-oriented OS Scheduling Support for Multicore Systems 面向公平的多核系统调度支持
Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI: 10.1145/2925426.2926262
Changdae Kim, Jaehyuk Huh
{"title":"Fairness-oriented OS Scheduling Support for Multicore Systems","authors":"Changdae Kim, Jaehyuk Huh","doi":"10.1145/2925426.2926262","DOIUrl":"https://doi.org/10.1145/2925426.2926262","url":null,"abstract":"Although traditional CPU scheduling efficiently utilizes multiple cores with equal computing capacity, the advent of multicores with diverse capabilities pose challenges to CPU scheduling. For the multi-cores with uneven computing capability, scheduling is essential to exploit the efficiency of core asymmetry, by matching each application with the best core type. However, in addition to the efficiency, an important aspect of CPU scheduling is fairness in CPU provisioning. Such uneven core capability is inherently unfair to threads and causes performance variance, as applications running on fast cores receive higher capability than applications on slow cores. Depending on co-running applications and scheduling decisions, the performance of an application may vary significantly. This study investigates the fairness problem in multi-cores with uneven capability, and explores the design space of OS schedulers supporting multiple fairness constraints. In this paper, we consider two fairness-oriented constraints, minimum fairness for the minimum guaranteed performance and uniformity for performance variation reduction. This study proposes three scheduling policies which guarantee a minimum performance bound while improving the overall throughput and reducing performance variation too. The three proposed fairness-oriented schedulers are implemented for the Linux kernel with an online application monitoring technique. Using an emulated asymmetric multi-core with frequency scaling and a real asymmetric multi-core with the big.LITTLE architecture, the paper shows that the proposed schedulers can effectively support the specified fairness while improving overall system throughput.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127361882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication Lynx:使用操作系统和硬件支持快速细粒度核间通信
Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-03-24 DOI: 10.1145/2925426.2926274
Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang, Timothy M. Jones
{"title":"Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication","authors":"Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang, Timothy M. Jones","doi":"10.1145/2925426.2926274","DOIUrl":"https://doi.org/10.1145/2925426.2926274","url":null,"abstract":"Designing high-performance software queues for fast intercore communication is challenging, but critical for maximising software parallelism. State-of-the-art single-producer / single-consumer queues for streaming applications contain multiple sections, requiring the producer and consumer to operate independently on different sections from each other. While these queues perform well for coarse-grained data transfers, they perform poorly in the fine-grained case. This paper proposes Lynx, a novel SP/SC queue, specifically tuned for fine-grained communication. Lynx is built from the ground up, reducing the generated code on the critical-path to just two operations per enqueue and dequeue. To achieve this it relies on existing commodity processor hardware and operating system exception handling support to deal with infrequent queue maintenance operations. Lynx outperforms the state-of-the art by up to 1.57x in total 64-bit throughput reaching a peak throughput of 15.7GB/s on a common desktop system. Real applications using Lynx get a performance improvement of up to 1.4x.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131529208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing BLASX:面向异构多gpu计算的高性能三级BLAS库
Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2015-10-16 DOI: 10.1145/2925426.2926256
Linnan Wang, Wei Wu, Jianxiong Xiao, Yezhou Yang
{"title":"BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing","authors":"Linnan Wang, Wei Wu, Jianxiong Xiao, Yezhou Yang","doi":"10.1145/2925426.2926256","DOIUrl":"https://doi.org/10.1145/2925426.2926256","url":null,"abstract":"Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we investigate existing multi-GPU level 3 BLAS and present that 1) issues, such as the improper load balancing, inefficient communication, insufficient GPU stream level concurrency and data caching, impede current implementations from fully harnessing heterogeneous computing resources; 2) and the inter-GPU Peer-to-Peer(P2P) communication remains unexplored. We then present BLASX: a highly optimized multi-GPU level-3 BLAS. We adopt the concepts of algorithms-by-tiles treating a matrix tile as the basic data unit and operations on tiles as the basic task. Tasks are guided with a dynamic asynchronous runtime, which is cache and locality aware. The communication cost under BLASX becomes trivial as it perfectly overlaps communication and computation across multiple streams during asynchronous task progression. It also takes the current tile cache scheme one step further by proposing an innovative 2-level hierarchical tile cache, taking advantage of inter-GPU P2P communication. As a result, linear speedup is observable with BLASX under multi-GPU configurations; and the extensive benchmarks demonstrate that BLASX consistently outperforms the related leading industrial and academic implementations such as cuBLAS-XT, SuperMatrix, MAGMA.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129795970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信