{"title":"Optimal data layout for block-level random accesses to scratchpad","authors":"Shreyas G. Singapura, R. Kannan, V. Prasanna","doi":"10.1109/HPEC.2017.8091088","DOIUrl":null,"url":null,"abstract":"3D memory is becoming an increasingly popular technology to overcome the performance gap between memory and processors. It has led to the development of new architectures with scratchpad memory, which offer high bandwidth and user-controlled access features. The ideal performance of this scratchpad memory is peak bandwidth for any random block access. However, 3D memories come with their constraints on the \"ideal\" access patterns for which high bandwidth is guaranteed and the actual bandwidth is significantly lower for other access patterns. In this paper, we address the challenge of achieving high bandwidth for random block accesses to 3D memory. We present optimal data layout which achieves maximum bandwidth for each vault irrespective of the block accessed in a vault. Our data layout expressed as a mapping function determined by the architecture parameters exploits inter-layer pipelining to map the elements of each block among various layers of a vault in a specific pattern. By doing so, our data layout can absorb the latency of accesses to banks in the same layer and more importantly, hide the latency of accesses to different rows in the same bank irrespective of the block being accessed. We compare the performance of our proposed data layout with existing data layout using PARSEC 2.0 benchmarks. Our experimental results demonstrate as high as 56% improvement in access time in comparison with the existing data layout across various workloads.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
3D memory is becoming an increasingly popular technology to overcome the performance gap between memory and processors. It has led to the development of new architectures with scratchpad memory, which offer high bandwidth and user-controlled access features. The ideal performance of this scratchpad memory is peak bandwidth for any random block access. However, 3D memories come with their constraints on the "ideal" access patterns for which high bandwidth is guaranteed and the actual bandwidth is significantly lower for other access patterns. In this paper, we address the challenge of achieving high bandwidth for random block accesses to 3D memory. We present optimal data layout which achieves maximum bandwidth for each vault irrespective of the block accessed in a vault. Our data layout expressed as a mapping function determined by the architecture parameters exploits inter-layer pipelining to map the elements of each block among various layers of a vault in a specific pattern. By doing so, our data layout can absorb the latency of accesses to banks in the same layer and more importantly, hide the latency of accesses to different rows in the same bank irrespective of the block being accessed. We compare the performance of our proposed data layout with existing data layout using PARSEC 2.0 benchmarks. Our experimental results demonstrate as high as 56% improvement in access time in comparison with the existing data layout across various workloads.