{"title":"Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines","authors":"Artur Mariano, M. Diener, C. Bischof, P. Navaux","doi":"10.1109/PDP.2016.37","DOIUrl":null,"url":null,"abstract":"Improving the memory access behavior of parallel applications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Improving the memory access behavior of parallel applications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.