编译器对NUMA架构中选择性页面迁移的支持

G. Piccoli, Henrique Nazaré, R. E. Rodrigues, C. Pousa, E. Borin, Fernando Magno Quintão Pereira
{"title":"编译器对NUMA架构中选择性页面迁移的支持","authors":"G. Piccoli, Henrique Nazaré, R. E. Rodrigues, C. Pousa, E. Borin, Fernando Magno Quintão Pereira","doi":"10.1145/2628071.2628077","DOIUrl":null,"url":null,"abstract":"Current high-performance multicore processors provide users with a non-uniform memory access model (NUMA). These systems perform better when threads access data on memory banks next to the core where they run. However, ensuring data locality is difficult. In this paper, we propose compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates memory pages to improve data locality. Our technique combines static and dynamic analyses and is capable of identifying the most promising pages to migrate. Statically, we infer the size of arrays, plus the amount of reuse of each memory access instruction in a program. These estimates rely on a simple, yet accurate, trip count predictor of our own design. This knowledge let's us build templates of dynamic checks, to be filled with values known only at runtime. These checks determine when it is profitable to migrate data closer to the processors where this data is used. Our static analyses are quadratic on the number of variables in a program, and the dynamic checks are O(1) in practice. Our technique does not require any form of user intervention, neither the support of a third-party middleware, nor modifications in the operating system's kernel. We have applied our technique on several parallel algorithms, which are completely oblivious to the asymmetric memory topology, and have observed speedups of up to 4×, compared to static heuristics. We compare our approach against Minas, a middleware that supports NUMA-aware data allocation, and show that we can outperform it by up to 50% in some cases.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"Compiler support for selective page migration in NUMA architectures\",\"authors\":\"G. Piccoli, Henrique Nazaré, R. E. Rodrigues, C. Pousa, E. Borin, Fernando Magno Quintão Pereira\",\"doi\":\"10.1145/2628071.2628077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current high-performance multicore processors provide users with a non-uniform memory access model (NUMA). These systems perform better when threads access data on memory banks next to the core where they run. However, ensuring data locality is difficult. In this paper, we propose compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates memory pages to improve data locality. Our technique combines static and dynamic analyses and is capable of identifying the most promising pages to migrate. Statically, we infer the size of arrays, plus the amount of reuse of each memory access instruction in a program. These estimates rely on a simple, yet accurate, trip count predictor of our own design. This knowledge let's us build templates of dynamic checks, to be filled with values known only at runtime. These checks determine when it is profitable to migrate data closer to the processors where this data is used. Our static analyses are quadratic on the number of variables in a program, and the dynamic checks are O(1) in practice. Our technique does not require any form of user intervention, neither the support of a third-party middleware, nor modifications in the operating system's kernel. We have applied our technique on several parallel algorithms, which are completely oblivious to the asymmetric memory topology, and have observed speedups of up to 4×, compared to static heuristics. We compare our approach against Minas, a middleware that supports NUMA-aware data allocation, and show that we can outperform it by up to 50% in some cases.\",\"PeriodicalId\":263670,\"journal\":{\"name\":\"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2628071.2628077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2628071.2628077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

摘要

当前的高性能多核处理器为用户提供了一种非统一内存访问模型(NUMA)。当线程访问内核旁边的内存库中的数据时,这些系统的性能会更好。然而,确保数据的局部性是困难的。在本文中,我们提出了编译器分析和代码生成方法来支持一个轻量级的运行时系统,该系统可以动态迁移内存页面以改善数据局部性。我们的技术结合了静态和动态分析,能够识别最有希望迁移的页面。静态地,我们推断数组的大小,加上程序中每个内存访问指令的重用量。这些估计依赖于我们自己设计的一个简单而准确的行程数预测器。这些知识让我们可以构建动态检查的模板,这些模板将填充只有在运行时才知道的值。这些检查确定何时将数据迁移到更靠近使用该数据的处理器的地方是有益的。我们的静态分析对程序中变量的数量是二次的,而实际的动态检查是O(1)。我们的技术不需要任何形式的用户干预,不需要第三方中间件的支持,也不需要对操作系统内核进行修改。我们已经在几个并行算法上应用了我们的技术,这些算法完全不受非对称内存拓扑的影响,并且与静态启发式相比,我们已经观察到高达4倍的速度提升。我们将我们的方法与Minas(一种支持numa感知的数据分配的中间件)进行了比较,并显示在某些情况下我们可以比它高出50%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Compiler support for selective page migration in NUMA architectures
Current high-performance multicore processors provide users with a non-uniform memory access model (NUMA). These systems perform better when threads access data on memory banks next to the core where they run. However, ensuring data locality is difficult. In this paper, we propose compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates memory pages to improve data locality. Our technique combines static and dynamic analyses and is capable of identifying the most promising pages to migrate. Statically, we infer the size of arrays, plus the amount of reuse of each memory access instruction in a program. These estimates rely on a simple, yet accurate, trip count predictor of our own design. This knowledge let's us build templates of dynamic checks, to be filled with values known only at runtime. These checks determine when it is profitable to migrate data closer to the processors where this data is used. Our static analyses are quadratic on the number of variables in a program, and the dynamic checks are O(1) in practice. Our technique does not require any form of user intervention, neither the support of a third-party middleware, nor modifications in the operating system's kernel. We have applied our technique on several parallel algorithms, which are completely oblivious to the asymmetric memory topology, and have observed speedups of up to 4×, compared to static heuristics. We compare our approach against Minas, a middleware that supports NUMA-aware data allocation, and show that we can outperform it by up to 50% in some cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信