Rapid: Region-Based Pointer Disambiguation

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Khushboo Chitre, Piyus Kedia, Rahul Purandare
{"title":"Rapid: Region-Based Pointer Disambiguation","authors":"Khushboo Chitre, Piyus Kedia, Rahul Purandare","doi":"10.1145/3622859","DOIUrl":null,"url":null,"abstract":"Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable 𝑂 (1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires 𝑂 (𝑙𝑜𝑔 𝑛) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge. In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient. Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"27 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable 𝑂 (1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires 𝑂 (𝑙𝑜𝑔 𝑛) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge. In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient. Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.
快速:基于区域的指针消歧
过程间别名分析常常为了可伸缩性而牺牲精度。因此,像GCC和LLVM这样的现代编译器实现了更大的可伸缩性,但更不精确的过程内别名分析。这种折衷会使编译器错过潜在的优化机会,从而影响应用程序的性能。现代编译器通过动态检查指针消歧来实现循环版本控制,以启用错过的优化。多面体访问范围分析和符号范围分析使𝑂(1)范围检查循环内内存访问的不重叠。然而,这些方法只适用于循环边界为循环不变量的循环。为了解决这一限制,研究人员提出了一种需要𝑂(𝑙𝑜𝑔𝑛)内存访问来消除指针歧义的技术。其他人通过限制对象大小和对齐来提高单内存访问的动态检查的性能。但是,前一种方法由于其动态检查而产生明显的开销,而后一种方法具有明显的分配器开销。因此,可伸缩性仍然是一个挑战。在这项工作中,我们提出了一个工具,Rapid,它进一步降低了分配器的开销和现有方法中提出的动态检查。关键思想是使用分析器识别需要消歧检查的对象,并将它们分配到不同的区域,这些区域是不相交的内存区域。消歧检查只是比较对象对应的区域。这些区域是对齐的,因此在不同区域分配的任意两个对象的地址的前32位总是不同的。因此,动态检查不需要任何内存访问来确保对象属于不同的区域,从而提高了检查的效率。Rapid在Polybench上实现了52.94%的最大性能优势,在CPU SPEC 2017基准测试中实现了1.88%的最大性能优势。我们的分配器的最大CPU开销为0.57%,CPU SPEC 2017基准测试的几何平均值为-0.2%。由于分配器和动态检查的低开销,Rapid可以提高16个CPU SPEC 2017基准中的12个的性能。相比之下,在比较中使用的最先进的方法只能提高五个CPU SPEC 2017基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信