低延迟处理器-存储器通信的随机分组存储器网络

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI:10.1109/PDP.2016.18

Daichi Fujiki, Hiroki Matsutani, M. Koibuchi, H. Amano

{"title":"低延迟处理器-存储器通信的随机分组存储器网络","authors":"Daichi Fujiki, Hiroki Matsutani, M. Koibuchi, H. Amano","doi":"10.1109/PDP.2016.18","DOIUrl":null,"url":null,"abstract":"Three-dimensional stacked memory is considered to be one of the innovative elements for the next-generation computing system, for it provides high bandwidth and energy efficiency. Particularly, packet routing ability of Hybrid Memory Cubes (HMCs) enables new interconnects for the memories, giving flexibility to its topological design space. Since memory-processor communication is latency-sensitive, our challenge is to alleviate latency of the memory interconnection network, which is subject to high overheads from hop-count increase. Interestingly, random network topologies are known to have remarkably low diameter that is even comparable to theoretical Moore graph. In this context, we first propose to exploit the random topologies for the memory networks. Second, we also propose several optimizations to leverage the random topologies to be further adaptive to the latency-sensitive memory-processor communication: communication path length based selection, deterministic minimal routing, and page-size granularity memory mapping. Finally, we present interesting results of our evaluation: the random networks with universal memory access outperformed non-random networks of which memory access was optimally localized.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Randomizing Packet Memory Networks for Low-Latency Processor-Memory Communication\",\"authors\":\"Daichi Fujiki, Hiroki Matsutani, M. Koibuchi, H. Amano\",\"doi\":\"10.1109/PDP.2016.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Three-dimensional stacked memory is considered to be one of the innovative elements for the next-generation computing system, for it provides high bandwidth and energy efficiency. Particularly, packet routing ability of Hybrid Memory Cubes (HMCs) enables new interconnects for the memories, giving flexibility to its topological design space. Since memory-processor communication is latency-sensitive, our challenge is to alleviate latency of the memory interconnection network, which is subject to high overheads from hop-count increase. Interestingly, random network topologies are known to have remarkably low diameter that is even comparable to theoretical Moore graph. In this context, we first propose to exploit the random topologies for the memory networks. Second, we also propose several optimizations to leverage the random topologies to be further adaptive to the latency-sensitive memory-processor communication: communication path length based selection, deterministic minimal routing, and page-size granularity memory mapping. Finally, we present interesting results of our evaluation: the random networks with universal memory access outperformed non-random networks of which memory access was optimally localized.\",\"PeriodicalId\":192273,\"journal\":{\"name\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2016.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

三维堆叠存储器具有高带宽和高能效的特点，被认为是下一代计算系统的创新元素之一。特别是，混合内存立方体(hmc)的分组路由能力为存储器提供了新的互连，为其拓扑设计空间提供了灵活性。由于内存-处理器通信是对延迟敏感的，我们的挑战是减轻内存互连网络的延迟，这受到跳数增加带来的高开销的影响。有趣的是，已知随机网络拓扑具有非常低的直径，甚至可以与理论摩尔图相媲美。在此背景下，我们首先提出利用随机拓扑结构来实现记忆网络。其次，我们还提出了一些优化，以利用随机拓扑进一步适应对延迟敏感的内存-处理器通信:基于通信路径长度的选择、确定性最小路由和页面大小粒度的内存映射。最后，我们给出了有趣的评估结果:具有通用存储器访问的随机网络优于具有最佳局部存储器访问的非随机网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Randomizing Packet Memory Networks for Low-Latency Processor-Memory Communication

Three-dimensional stacked memory is considered to be one of the innovative elements for the next-generation computing system, for it provides high bandwidth and energy efficiency. Particularly, packet routing ability of Hybrid Memory Cubes (HMCs) enables new interconnects for the memories, giving flexibility to its topological design space. Since memory-processor communication is latency-sensitive, our challenge is to alleviate latency of the memory interconnection network, which is subject to high overheads from hop-count increase. Interestingly, random network topologies are known to have remarkably low diameter that is even comparable to theoretical Moore graph. In this context, we first propose to exploit the random topologies for the memory networks. Second, we also propose several optimizations to leverage the random topologies to be further adaptive to the latency-sensitive memory-processor communication: communication path length based selection, deterministic minimal routing, and page-size granularity memory mapping. Finally, we present interesting results of our evaluation: the random networks with universal memory access outperformed non-random networks of which memory access was optimally localized.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

自引率

0.00%

发文量