基于GoblinCore-64架构的并发动态内存合并

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI:10.1145/2989081.2989128

Xi Wang, John D. Leidel, Yong Chen

{"title":"基于GoblinCore-64架构的并发动态内存合并","authors":"Xi Wang, John D. Leidel, Yong Chen","doi":"10.1145/2989081.2989128","DOIUrl":null,"url":null,"abstract":"The majority of modern microprocessors are architected to utilize multi-level data caches as a primary optimization to reduce the latency and increase the perceived bandwidth from an application. The spatial and temporal locality provided by data caches work well in conjunction with applications that access memory in a linear fashion. However, applications that exhibit random or non-deterministic memory access patterns often induce a significant number of data cache misses, thus reducing the natural performance benefit from the data cache. In response to the performance penalties inherently present with non-deterministic applications, we have constructed a unique memory hierarchy within the GoblinCore-64 (GC64) architecture explicitly designed to exploit memory performance from irregular memory access patterns. The GC64 architecture combines a RISC-V-based core coupled with latency-hiding architectural features to a memory hierarchy with Hybrid Memory Cube (HMC) devices. In order to cope with the inherent non-determinism of applications and to exploit the packetized interface presented by the HMC device, we develop a methodology and associated implementation of a dynamic memory coalescing unit for the GC64 memory hierarchy that permits us to statistically sample memory requests from non-deterministic applications and coalesce them into the largest possible HMC payload requests. In this work, we present two parallel methodologies and associated implementations for coalescing non-deterministic memory requests into the largest potential HMC request by constructing a binary tree representation of the live memory requests from disparate cores. We present the coalesced HMC memory request results from applications that exhibit linear and non-linear memory request patterns compiled for a RISC-V core in contrast with a traditional memory hierarchy.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture\",\"authors\":\"Xi Wang, John D. Leidel, Yong Chen\",\"doi\":\"10.1145/2989081.2989128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The majority of modern microprocessors are architected to utilize multi-level data caches as a primary optimization to reduce the latency and increase the perceived bandwidth from an application. The spatial and temporal locality provided by data caches work well in conjunction with applications that access memory in a linear fashion. However, applications that exhibit random or non-deterministic memory access patterns often induce a significant number of data cache misses, thus reducing the natural performance benefit from the data cache. In response to the performance penalties inherently present with non-deterministic applications, we have constructed a unique memory hierarchy within the GoblinCore-64 (GC64) architecture explicitly designed to exploit memory performance from irregular memory access patterns. The GC64 architecture combines a RISC-V-based core coupled with latency-hiding architectural features to a memory hierarchy with Hybrid Memory Cube (HMC) devices. In order to cope with the inherent non-determinism of applications and to exploit the packetized interface presented by the HMC device, we develop a methodology and associated implementation of a dynamic memory coalescing unit for the GC64 memory hierarchy that permits us to statistically sample memory requests from non-deterministic applications and coalesce them into the largest possible HMC payload requests. In this work, we present two parallel methodologies and associated implementations for coalescing non-deterministic memory requests into the largest potential HMC request by constructing a binary tree representation of the live memory requests from disparate cores. We present the coalesced HMC memory request results from applications that exhibit linear and non-linear memory request patterns compiled for a RISC-V core in contrast with a traditional memory hierarchy.\",\"PeriodicalId\":283512,\"journal\":{\"name\":\"Proceedings of the Second International Symposium on Memory Systems\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second International Symposium on Memory Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2989081.2989128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Symposium on Memory Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2989081.2989128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

大多数现代微处理器的架构都是利用多级数据缓存作为主要优化，以减少延迟并增加应用程序的感知带宽。数据缓存提供的空间和时间局部性可以很好地与以线性方式访问内存的应用程序配合使用。然而，表现出随机或非确定性内存访问模式的应用程序通常会导致大量的数据缓存丢失，从而降低了数据缓存带来的自然性能收益。为了应对非确定性应用程序固有的性能损失，我们在GoblinCore-64 (GC64)体系结构中构造了一个独特的内存层次结构，明确地设计用于从不规则内存访问模式中利用内存性能。GC64架构将基于risc - v的核心与延迟隐藏架构特性结合到具有混合内存立方体(HMC)设备的内存层次结构中。为了应对应用程序固有的不确定性，并利用HMC设备提供的打包接口，我们为GC64内存层次结构开发了一种动态内存合并单元的方法和相关实现，该单元允许我们从非确定性应用程序中统计采样内存请求，并将它们合并为最大可能的HMC有效负载请求。在这项工作中，我们提出了两种并行方法和相关实现，通过构建来自不同核心的实时内存请求的二叉树表示，将不确定性内存请求合并为最大的潜在HMC请求。我们展示了来自应用程序的合并HMC内存请求结果，这些应用程序显示了为RISC-V内核编译的线性和非线性内存请求模式，而不是传统的内存层次结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Concurrent Dynamic Memory Coalescing on GoblinCore-64 Architecture

The majority of modern microprocessors are architected to utilize multi-level data caches as a primary optimization to reduce the latency and increase the perceived bandwidth from an application. The spatial and temporal locality provided by data caches work well in conjunction with applications that access memory in a linear fashion. However, applications that exhibit random or non-deterministic memory access patterns often induce a significant number of data cache misses, thus reducing the natural performance benefit from the data cache. In response to the performance penalties inherently present with non-deterministic applications, we have constructed a unique memory hierarchy within the GoblinCore-64 (GC64) architecture explicitly designed to exploit memory performance from irregular memory access patterns. The GC64 architecture combines a RISC-V-based core coupled with latency-hiding architectural features to a memory hierarchy with Hybrid Memory Cube (HMC) devices. In order to cope with the inherent non-determinism of applications and to exploit the packetized interface presented by the HMC device, we develop a methodology and associated implementation of a dynamic memory coalescing unit for the GC64 memory hierarchy that permits us to statistically sample memory requests from non-deterministic applications and coalesce them into the largest possible HMC payload requests. In this work, we present two parallel methodologies and associated implementations for coalescing non-deterministic memory requests into the largest potential HMC request by constructing a binary tree representation of the live memory requests from disparate cores. We present the coalesced HMC memory request results from applications that exhibit linear and non-linear memory request patterns compiled for a RISC-V core in contrast with a traditional memory hierarchy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Second International Symposium on Memory Systems

自引率

0.00%

发文量