{"title":"使用3d堆叠DRAM的内存数据重组","authors":"Berkin Akin, F. Franchetti, J. Hoe","doi":"10.1145/2749469.2750397","DOIUrl":null,"url":null,"abstract":"In this paper we focus on common data reorganization operations such as shuffle, pack/unpack, swap, transpose, and layout transformations. Although these operations simply relocate the data in the memory, they are costly on conventional systems mainly due to inefficient access patterns, limited data reuse and roundtrip data traversal throughout the memory hierarchy. This paper presents a two pronged approach for efficient data reorganization, which combines (i) a proposed DRAM-aware reshape accelerator integrated within 3D-stacked DRAM, and (ii) a mathematical framework that is used to represent and optimize the reorganization operations. We evaluate our proposed system through two major use cases. First, we demonstrate the reshape accelerator in performing a physical address remapping via data layout transform to utilize the internal parallelism/locality of the 3D-stacked DRAM structure more efficiently for general purpose workloads. Then, we focus on offloading and accelerating commonly used data reorganization routines selected from the Intel Math Kernel Library package. We evaluate the energy and performance benefits of our approach by comparing it against existing optimized implementations on state-of-the-art GPUs and CPUs. For the various test cases, in-memory data reorganization provides orders of magnitude performance and energy efficiency improvements via low overhead hardware.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"78 1","pages":"131-143"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"178","resultStr":"{\"title\":\"Data reorganization in memory using 3D-stacked DRAM\",\"authors\":\"Berkin Akin, F. Franchetti, J. Hoe\",\"doi\":\"10.1145/2749469.2750397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we focus on common data reorganization operations such as shuffle, pack/unpack, swap, transpose, and layout transformations. Although these operations simply relocate the data in the memory, they are costly on conventional systems mainly due to inefficient access patterns, limited data reuse and roundtrip data traversal throughout the memory hierarchy. This paper presents a two pronged approach for efficient data reorganization, which combines (i) a proposed DRAM-aware reshape accelerator integrated within 3D-stacked DRAM, and (ii) a mathematical framework that is used to represent and optimize the reorganization operations. We evaluate our proposed system through two major use cases. First, we demonstrate the reshape accelerator in performing a physical address remapping via data layout transform to utilize the internal parallelism/locality of the 3D-stacked DRAM structure more efficiently for general purpose workloads. Then, we focus on offloading and accelerating commonly used data reorganization routines selected from the Intel Math Kernel Library package. We evaluate the energy and performance benefits of our approach by comparing it against existing optimized implementations on state-of-the-art GPUs and CPUs. For the various test cases, in-memory data reorganization provides orders of magnitude performance and energy efficiency improvements via low overhead hardware.\",\"PeriodicalId\":6878,\"journal\":{\"name\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"78 1\",\"pages\":\"131-143\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"178\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2749469.2750397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 178
摘要
在本文中,我们着重于常见的数据重组操作,如shuffle, pack/unpack, swap,转置和布局转换。尽管这些操作只是在内存中重新定位数据,但在传统系统上,由于低效的访问模式、有限的数据重用和在整个内存层次结构中的往返数据遍历,它们的成本很高。本文提出了一种双管齐下的有效数据重组方法,该方法结合了(i)在3d堆叠DRAM中集成的ram感知重塑加速器,以及(ii)用于表示和优化重组操作的数学框架。我们通过两个主要用例来评估我们建议的系统。首先,我们演示了重塑加速器通过数据布局转换执行物理地址重新映射,以更有效地利用3d堆叠DRAM结构的内部并行性/局部性,用于通用工作负载。然后,我们着重于卸载和加速从Intel Math Kernel Library包中选择的常用数据重组例程。我们通过将我们的方法与现有的最先进的gpu和cpu上的优化实现进行比较,来评估我们的方法的能源和性能优势。对于各种测试用例,内存中的数据重组通过低开销硬件提供了数量级的性能和能源效率改进。
Data reorganization in memory using 3D-stacked DRAM
In this paper we focus on common data reorganization operations such as shuffle, pack/unpack, swap, transpose, and layout transformations. Although these operations simply relocate the data in the memory, they are costly on conventional systems mainly due to inefficient access patterns, limited data reuse and roundtrip data traversal throughout the memory hierarchy. This paper presents a two pronged approach for efficient data reorganization, which combines (i) a proposed DRAM-aware reshape accelerator integrated within 3D-stacked DRAM, and (ii) a mathematical framework that is used to represent and optimize the reorganization operations. We evaluate our proposed system through two major use cases. First, we demonstrate the reshape accelerator in performing a physical address remapping via data layout transform to utilize the internal parallelism/locality of the 3D-stacked DRAM structure more efficiently for general purpose workloads. Then, we focus on offloading and accelerating commonly used data reorganization routines selected from the Intel Math Kernel Library package. We evaluate the energy and performance benefits of our approach by comparing it against existing optimized implementations on state-of-the-art GPUs and CPUs. For the various test cases, in-memory data reorganization provides orders of magnitude performance and energy efficiency improvements via low overhead hardware.