The super warp architecture with random address shift

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI:10.1109/HiPC.2013.6799118

K. Nakano, Susumu Matsumae

{"title":"The super warp architecture with random address shift","authors":"K. Nakano, Susumu Matsumae","doi":"10.1109/HiPC.2013.6799118","DOIUrl":null,"url":null,"abstract":"The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access by a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and each warp of w threads access the shared memory at the same time. However, memory access requests destined for the same memory bank are processed sequentially. Hence, it is very important for developing efficient algorithms to reduce the memory access congestion, the maximum number of memory access requests destined for the same bank. However, it is not easy to minimize the memory access congestion for some problems. The main contribution of this paper is to present novel and practical parallel computing models in which the congestion is small for any memory access requests. We first present the Super Discrete Memory Machine (SDMM), an extended version of the DMM, which supports a super warp with multiple warps. Memory access requests by multiple warps in a super warp are packed through pipeline registers to reduce the memory access congestion. We then go on to apply the random address shift technique to the SDMM. The resulting machine, the Random Super Discrete Memory Machine (RSDMM) can equalize memory access requests by a super warp. Quite surprisingly, for any memory access requests by a super warp on the RSDMM, the overhead of the memory access congestion is within a constant factor of perfectly scheduled memory access. Thus, unlike the DMM, developers of parallel algorithms do not have to consider the memory access congestion on the RSDMM. The congestion on the RSDMM is evaluated by theoretical analysis as well as by experiments.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"283 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"20th Annual International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2013.6799118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access by a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and each warp of w threads access the shared memory at the same time. However, memory access requests destined for the same memory bank are processed sequentially. Hence, it is very important for developing efficient algorithms to reduce the memory access congestion, the maximum number of memory access requests destined for the same bank. However, it is not easy to minimize the memory access congestion for some problems. The main contribution of this paper is to present novel and practical parallel computing models in which the congestion is small for any memory access requests. We first present the Super Discrete Memory Machine (SDMM), an extended version of the DMM, which supports a super warp with multiple warps. Memory access requests by multiple warps in a super warp are packed through pipeline registers to reduce the memory access congestion. We then go on to apply the random address shift technique to the SDMM. The resulting machine, the Random Super Discrete Memory Machine (RSDMM) can equalize memory access requests by a super warp. Quite surprisingly, for any memory access requests by a super warp on the RSDMM, the overhead of the memory access congestion is within a constant factor of perfectly scheduled memory access. Thus, unlike the DMM, developers of parallel algorithms do not have to consider the memory access congestion on the RSDMM. The congestion on the RSDMM is evaluated by theoretical analysis as well as by experiments.

查看原文本刊更多论文

具有随机地址移位的超曲结构

离散内存机(DMM)是一种理论上的并行计算模型，它通过支持cuda的gpu上的流多处理器捕获内存访问的本质。DMM有w个内存库，它们组成一个共享内存，并且w个线程中的每个线程同时访问共享内存。然而，预定到同一内存库的内存访问请求是顺序处理的。因此，开发有效的算法来减少内存访问拥塞，即到达同一银行的内存访问请求的最大数量是非常重要的。然而，对于某些问题，最小化内存访问拥塞并不容易。本文的主要贡献是提出了新颖实用的并行计算模型，其中对于任何内存访问请求的拥塞都很小。我们首先提出了超级离散存储机(SDMM)，这是DMM的扩展版本，它支持具有多个翘曲的超级翘曲。在一个超级warp中，多个warp的内存访问请求通过管道寄存器进行打包，以减少内存访问拥塞。然后我们继续将随机地址移位技术应用于SDMM。由此产生的机器，随机超级离散内存机(RSDMM)可以通过超级扭曲均衡内存访问请求。令人惊讶的是，对于RSDMM上的任何内存访问请求，内存访问拥塞的开销都在完美调度的内存访问的常数因子之内。因此，与DMM不同，并行算法的开发人员不必考虑RSDMM上的内存访问拥塞。通过理论分析和实验对RSDMM上的拥塞进行了评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

20th Annual International Conference on High Performance Computing

自引率

0.00%

发文量