S. Rai, A. Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, N. Vijaykrishnan, Ameen Akel, S. Eilert
{"title":"在DDR DRAM层次结构中设计用于伸缩的通用计算空间,以减少映射工作负载","authors":"S. Rai, A. Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, N. Vijaykrishnan, Ameen Akel, S. Eilert","doi":"10.1145/3457388.3458661","DOIUrl":null,"url":null,"abstract":"This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloads\",\"authors\":\"S. Rai, A. Sivasubramaniam, Adithya Kumar, Prasanna Venkatesh Rengasamy, N. Vijaykrishnan, Ameen Akel, S. Eilert\",\"doi\":\"10.1145/3457388.3458661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.\",\"PeriodicalId\":136482,\"journal\":{\"name\":\"Proceedings of the 18th ACM International Conference on Computing Frontiers\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457388.3458661\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457388.3458661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloads
This paper conducts a design space exploration of placing general purpose RISCV cores within the DDR DRAM hierarchy to boost the performance of important data analytics applications in the datacenter. We investigate the hardware (where? how many? how to interface?) and software (how to place data? how to map computations?) choices for placing these cores within the rank, chip, and bank of the DIMM slots to take advantage of the locality vs. parallelism trade-offs. We use the popular MapReduce paradigm, normally used to scale out workloads across servers, to scale in these workloads into the DDR DRAM hierarchy. We evaluate the design space using diverse off-the-shelf Apache Spark Workloads to show the pros-and-cons of different hardware placement and software mapping strategies. Results show that bank-level RISCV cores can provide tremendous speedup (up to 363X) for the offload-able parts of these applications, amounting to 14X speedup overall in some applications. Even in the non-amenable applications, we get at least 31% performance boost for the entire application. To realize this, we incur an area overhead of 4% at the bank level, and increase in temperature of < 4°C over the chip averaged over all applications.