Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs

Proceedings of the 2018 International Conference on Supercomputing Pub Date : 2018-06-12 DOI:10.1145/3205289.3205312

Lluc Alvarez, Marc Casas, Jesús Labarta, E. Ayguadé, M. Valero, Miquel Moretó

{"title":"Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs","authors":"Lluc Alvarez, Marc Casas, Jesús Labarta, E. Ayguadé, M. Valero, Miquel Moretó","doi":"10.1145/3205289.3205312","DOIUrl":null,"url":null,"abstract":"Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited memory capacity is insufficient for modern HPC systems. For this reason, both stacked DRAM and off-chip memories are expected to co-exist in HPC architectures, giving raise to different approaches for architecting the stacked DRAM in the system. This paper proposes a runtime approach to transparently manage stacked DRAM memories in task-based programming models. In this approach the runtime system is in charge of copying the data accessed by the tasks to the stacked DRAM, without any complex hardware support nor modifications to the application code. To mitigate the cost of copying data between the stacked DRAM and the off-chip memory, the proposal includes an optimization to parallelize the copies across idle or additional helper threads. In addition, the runtime system is aware of the reuse pattern of the data accessed by the tasks, and can exploit this information to avoid unworthy copies of data to the stacked DRAM. Results on the Intel Knights Landing processor show that the proposed techniques achieve an average speedup of 14% against the state-of-the-art library to manage the stacked DRAM and 29% against a stacked DRAM architected as a hardware cache.","PeriodicalId":441217,"journal":{"name":"Proceedings of the 2018 International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3205289.3205312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited memory capacity is insufficient for modern HPC systems. For this reason, both stacked DRAM and off-chip memories are expected to co-exist in HPC architectures, giving raise to different approaches for architecting the stacked DRAM in the system. This paper proposes a runtime approach to transparently manage stacked DRAM memories in task-based programming models. In this approach the runtime system is in charge of copying the data accessed by the tasks to the stacked DRAM, without any complex hardware support nor modifications to the application code. To mitigate the cost of copying data between the stacked DRAM and the off-chip memory, the proposal includes an optimization to parallelize the copies across idle or additional helper threads. In addition, the runtime system is aware of the reuse pattern of the data accessed by the tasks, and can exploit this information to avoid unworthy copies of data to the stacked DRAM. Results on the Intel Knights Landing processor show that the proposed techniques achieve an average speedup of 14% against the state-of-the-art library to manage the stacked DRAM and 29% against a stacked DRAM architected as a hardware cache.

查看原文本刊更多论文

任务并行程序中堆叠DRAM存储器的运行时引导管理

堆叠DRAM存储器在高性能计算(HPC)架构中已经成为现实。这些存储器比传统的片外存储器提供更高的带宽，同时消耗更少的功率，但它们有限的存储器容量不足以满足现代HPC系统的需要。由于这个原因，堆叠DRAM和片外存储器有望在高性能计算架构中共存，从而提出了在系统中构建堆叠DRAM的不同方法。本文提出了一种在基于任务的编程模型中透明地管理堆叠DRAM存储器的运行时方法。在这种方法中，运行时系统负责将任务访问的数据复制到堆叠的DRAM中，不需要任何复杂的硬件支持，也不需要修改应用程序代码。为了降低在堆叠DRAM和片外内存之间复制数据的成本，该建议包括一个优化，以便在空闲或额外的辅助线程之间并行化副本。此外，运行时系统知道任务访问的数据的重用模式，并且可以利用这些信息来避免将不值得的数据拷贝到堆叠的DRAM中。在英特尔Knights Landing处理器上的结果表明，所提出的技术相对于管理堆叠DRAM的最先进库实现了14%的平均加速，相对于作为硬件缓存架构的堆叠DRAM实现了29%的平均加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 International Conference on Supercomputing

自引率

0.00%

发文量