AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling

Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2019-11-08 DOI:10.1145/3392717.3392738

Xianwei Cheng, Hui Zhao, M. Kandemir, Beilei Jiang, Gayatri Mehta

{"title":"AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling","authors":"Xianwei Cheng, Hui Zhao, M. Kandemir, Beilei Jiang, Gayatri Mehta","doi":"10.1145/3392717.3392738","DOIUrl":null,"url":null,"abstract":"Different GPU applications exhibit varying scalability patterns with network-on-chip (NoC), coalescing, memory and control divergence, and L1 cache behavior. A GPU consists of several Streaming Multi-processors (SMs) that collectively determine how shared resources are partitioned and accessed. Recent years have seen divergent paths in SM scaling towards scale-up (fewer, larger SMs) vs. scale-out (more, smaller SMs). However, neither scaling up nor scaling out can meet the scalability requirement of all applications running on a given GPU system, which inevitably results in performance degradation and resource under-utilization for some applications. In this work, we investigate major design parameters that influence GPU scaling. We then propose AMOEBA, a solution to GPU scaling through reconfigurable SM cores. AMOEBA monitors and predicts application scalability at run-time and adjusts the SM configuration to meet program requirements. AMOEBA also enables dynamic creation of heterogeneous SMs through independent fusing or splitting. AMOEBA is a microarchitecture-based solution and requires no additional programming effort or custom compiler support. Our experimental evaluations with application programs from various benchmark suites indicate that AMOEBA is able to achieve a maximum performance gain of 4.3x, and generates an average performance improvement of 47% when considering all benchmarks tested.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Different GPU applications exhibit varying scalability patterns with network-on-chip (NoC), coalescing, memory and control divergence, and L1 cache behavior. A GPU consists of several Streaming Multi-processors (SMs) that collectively determine how shared resources are partitioned and accessed. Recent years have seen divergent paths in SM scaling towards scale-up (fewer, larger SMs) vs. scale-out (more, smaller SMs). However, neither scaling up nor scaling out can meet the scalability requirement of all applications running on a given GPU system, which inevitably results in performance degradation and resource under-utilization for some applications. In this work, we investigate major design parameters that influence GPU scaling. We then propose AMOEBA, a solution to GPU scaling through reconfigurable SM cores. AMOEBA monitors and predicts application scalability at run-time and adjusts the SM configuration to meet program requirements. AMOEBA also enables dynamic creation of heterogeneous SMs through independent fusing or splitting. AMOEBA is a microarchitecture-based solution and requires no additional programming effort or custom compiler support. Our experimental evaluations with application programs from various benchmark suites indicate that AMOEBA is able to achieve a maximum performance gain of 4.3x, and generates an average performance improvement of 47% when considering all benchmarks tested.

查看原文本刊更多论文

AMOEBA:用于动态GPU扩展的粗粒度可重构架构

不同的GPU应用程序在片上网络(NoC)、合并、内存和控制发散以及L1缓存行为方面表现出不同的可伸缩性模式。GPU由几个流式多处理器(SMs)组成，它们共同决定如何对共享资源进行分区和访问。近年来，人们看到了短信扩展的不同路径，即向内扩展(更少、更大的短信)和向外扩展(更多、更小的短信)。然而，无论是向上扩展还是向外扩展都不能满足在给定GPU系统上运行的所有应用程序的可扩展性要求，这不可避免地会导致某些应用程序的性能下降和资源利用率不足。在这项工作中，我们研究了影响GPU缩放的主要设计参数。然后，我们提出了AMOEBA，一种通过可重构的SM内核来扩展GPU的解决方案。AMOEBA在运行时监视和预测应用程序的可伸缩性，并调整SM配置以满足程序需求。AMOEBA还支持通过独立融合或分裂动态创建异构SMs。AMOEBA是一种基于微体系结构的解决方案，不需要额外的编程工作或定制编译器支持。我们对各种基准测试套件的应用程序进行的实验评估表明，AMOEBA能够实现4.3倍的最大性能增益，并且在考虑所有测试的基准测试时，平均性能提高了47%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 34th ACM International Conference on Supercomputing

自引率

0.00%

发文量