Exploiting Data Similarity to Reduce Memory Footprints

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI:10.1109/IPDPS.2011.24

Susmit Biswas, B. Supinski, M. Schulz, D. Franklin, T. Sherwood, F. Chong

{"title":"Exploiting Data Similarity to Reduce Memory Footprints","authors":"Susmit Biswas, B. Supinski, M. Schulz, D. Franklin, T. Sherwood, F. Chong","doi":"10.1109/IPDPS.2011.24","DOIUrl":null,"url":null,"abstract":"Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage more efficiently -- preferably transparently -- could increase effective DRAM capacity and thus the benefit of multicore nodes for HPC systems. MPI application processes often exhibit significant data similarity. These data regions occupy multiple physical locations across the individual rank processes within a multicore node and thus offer a potential savings in memory capacity. These regions, primarily residing in heap, are dynamic, which makes them difficult to manage statically. Our novel memory allocation library, {\\it SBLLmallocShort}, automatically identifies identical memory blocks and merges them into a single copy. Our implementation is transparent to the application and does not require any kernel modifications. Overall, we demonstrate that {\\it SBLLmalloc} reduces the memory footprint of a range of MPI applications by $32.03\\%$ on average and up to $60.87\\%$. Further, {\\it SBLLmalloc} supports problem sizes for IRS over $21.36\\%$ larger than using standard memory management techniques, thus significantly increasing effective system size. Similarly, {\\it SBLLmalloc} requires $43.75\\%$ fewer nodes than standard memory management techniques to solve an AMG problem.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage more efficiently -- preferably transparently -- could increase effective DRAM capacity and thus the benefit of multicore nodes for HPC systems. MPI application processes often exhibit significant data similarity. These data regions occupy multiple physical locations across the individual rank processes within a multicore node and thus offer a potential savings in memory capacity. These regions, primarily residing in heap, are dynamic, which makes them difficult to manage statically. Our novel memory allocation library, {\it SBLLmallocShort}, automatically identifies identical memory blocks and merges them into a single copy. Our implementation is transparent to the application and does not require any kernel modifications. Overall, we demonstrate that {\it SBLLmalloc} reduces the memory footprint of a range of MPI applications by $32.03\%$ on average and up to $60.87\%$. Further, {\it SBLLmalloc} supports problem sizes for IRS over $21.36\%$ larger than using standard memory management techniques, thus significantly increasing effective system size. Similarly, {\it SBLLmalloc} requires $43.75\%$ fewer nodes than standard memory management techniques to solve an AMG problem.

查看原文本刊更多论文

利用数据相似度减少内存占用

内存大小长期以来限制了高性能计算(HPC)系统上的大规模应用。由于计算节点通常没有交换空间，因此物理内存通常会限制问题的大小。增加每个芯片的核心数和限制每个节点内存条数量的功率密度限制加剧了这个问题。此外，DRAM构成了整个HPC系统成本的重要部分。因此，与其向节点添加更多的DRAM，不如更有效地(最好是透明地)管理内存使用的机制可以增加有效的DRAM容量，从而为HPC系统带来多核节点的好处。MPI应用程序进程通常表现出显著的数据相似性。这些数据区域占用多核节点内各个rank进程的多个物理位置，因此可以节省内存容量。这些区域主要位于堆中，是动态的，因此很难对它们进行静态管理。我们的新内存分配库{\it SBLLmallocShort}自动识别相同的内存块并将它们合并到一个副本中。我们的实现对应用程序是透明的，不需要对内核进行任何修改。总的来说，我们证明{\it SBLLmalloc}平均减少了一系列MPI应用程序的内存占用32.03%，最高可达60.87 %。此外，{\it SBLLmalloc}支持比使用标准内存管理技术大21.36%的IRS问题大小，从而显着增加了有效的系统大小。类似地，{\it SBLLmalloc}需要比标准内存管理技术少43.75%的节点来解决AMG问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Parallel & Distributed Processing Symposium

自引率

0.00%

发文量