在DDR DRAM层次结构中设计用于伸缩的通用计算空间,以减少映射工作负载

S. Rai, A. Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, N. Vijaykrishnan, Ameen Akel, S. Eilert
{"title":"在DDR DRAM层次结构中设计用于伸缩的通用计算空间,以减少映射工作负载","authors":"S. Rai, A. Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, N. Vijaykrishnan, Ameen Akel, S. Eilert","doi":"10.1145/3457388.3458661","DOIUrl":null,"url":null,"abstract":"This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloads\",\"authors\":\"S. Rai, A. Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, N. Vijaykrishnan, Ameen Akel, S. Eilert\",\"doi\":\"10.1145/3457388.3458661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.\",\"PeriodicalId\":136482,\"journal\":{\"name\":\"Proceedings of the 18th ACM International Conference on Computing Frontiers\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457388.3458661\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457388.3458661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

本文对将通用RISCV内核置于DDR DRAM层次结构中的设计空间进行了探索,以提高数据中心中重要数据分析应用程序的性能。我们研究硬件(在哪里?有多少?如何接口?)和软件(如何放置数据?如何映射计算?)选择将这些内核放置在DIMM插槽的秩、芯片和组中,以利用局部性与并行性之间的权衡。我们使用流行的MapReduce范例,通常用于跨服务器扩展工作负载,将这些工作负载扩展到DDR DRAM层次结构中。我们使用各种现成的Apache Spark workload来评估设计空间,以显示不同硬件放置和软件映射策略的优缺点。结果表明,银行级RISCV内核可以为这些应用程序的可卸载部分提供巨大的加速(高达363X),在某些应用程序中总体加速可达14倍。即使在不可调整的应用程序中,整个应用程序的性能也至少提高了31%。为了实现这一点,我们在银行层面上产生了4%的面积开销,并且在所有应用中,芯片上的平均温度增加了< 4°C。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloads
This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信