Partitioning GPUs for Improved Scalability

Johan Janzen, D. Black-Schaffer, Andra Hugo
{"title":"Partitioning GPUs for Improved Scalability","authors":"Johan Janzen, D. Black-Schaffer, Andra Hugo","doi":"10.1109/SBAC-PAD.2016.14","DOIUrl":null,"url":null,"abstract":"To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.
划分gpu以提高可伸缩性
为了将应用程序移植到GPU上,开发人员需要将计算任务表达为具有数万个线程的高度并行执行,以填充GPU的计算资源。然而,虽然这将填补GPU的资源,它不一定提供最好的效率,因为任务可能会在运行足够的并行性来填补GPU时进行糟糕的扩展。在这项工作中,我们研究了如何通过在GPU的子分区上共同调度扩展性差的任务来提高吞吐量,以提高利用率。我们首先研究了GPU上典型HPC任务的可扩展性,然后利用这一见解通过扩展StarPU框架来共同调度GPU上的任务来提高吞吐量。我们证明了协同调度低扩展性GPU任务加速了Cholesky分解关键任务的执行,并在广泛的块大小范围内将应用程序的整体性能提高了9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信