为高性能计算引入内核级页面重用

Workshop on Memory System Performance and Correctness Pub Date : 2013-06-16 DOI:10.1145/2492408.2492414

S. Valat, Marc Pérache, W. Jalby

{"title":"为高性能计算引入内核级页面重用","authors":"S. Valat, Marc Pérache, W. Jalby","doi":"10.1145/2492408.2492414","DOIUrl":null,"url":null,"abstract":"Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Introducing kernel-level page reuse for high performance computing\",\"authors\":\"S. Valat, Marc Pérache, W. Jalby\",\"doi\":\"10.1145/2492408.2492414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.\",\"PeriodicalId\":130040,\"journal\":{\"name\":\"Workshop on Memory System Performance and Correctness\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Memory System Performance and Correctness\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2492408.2492414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory System Performance and Correctness","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2492408.2492414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

由于计算机体系结构的发展，越来越多的HPC应用程序必须包含基于线程的并行性并考虑内存消耗。这种演变需要更多地关注完整的内存管理链，特别是在多线程上下文中。几个内存分配器在用户空间端提供了更好的可伸缩性。但是，随着核心数量的稳步增加，操作系统的影响也不能再被忽视了。我们测量了操作系统内存子系统对128核实际应用程序总执行时间的三分之一的性能影响。在现代体系结构中，我们测量了高达40%的页面故障时间花在页面归零上。在本文中，我们详细介绍了一个改进分页性能的建议，该建议通过扩展mmap语义来消除这种非生产性页面归零的需求。为此，我们为每个进程添加了一个内核级内存页池，以便在不重置内容的情况下本地重用空闲页。我们的实验显示了显著的性能改进，特别是对于巨大的页面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Introducing kernel-level page reuse for high performance computing

Due to computer architecture evolution, more and more HPC applications have to include thread-based parallelism and take care of memory consumption. Such evolutions require more attention to the full memory management chain, particularly stressed in multi-threaded context. Several memory allocators provide better scalability on the user-space side. But, with the steadily increasing number of cores, the impact of the operating system cannot be neglected anymore. We measured performance impact of the OS memory sub-system for up to one third of the total execution time of a real application on 128 cores. On modern architectures, we measured that up to 40% of the page fault time is spent in page zeroing. In this paper, we detail a proposal to improve paging performance by removing the needs of this unproductive page zeroing through an extension of the mmap semantic. To this end, we added a kernel-level memory page pool per process to locally reuse free pages without content reset. Our experiments show significant performance improvements especially for huge pages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Memory System Performance and Correctness

自引率

0.00%

发文量