Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction最新文献

One-shot tuner for deep learning compilers 深度学习编译器的一次性调谐器

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517774

Jaehun Ryu, Eunhyeok Park, Hyojin Sung

引用次数: 5

Performant portable OpenMP 高性能的便携式OpenMP

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517780

Guray Ozen, M. Wolfe

{"title":"Performant portable OpenMP","authors":"Guray Ozen, M. Wolfe","doi":"10.1145/3497776.3517780","DOIUrl":"https://doi.org/10.1145/3497776.3517780","url":null,"abstract":"Accelerated computing has increased the need to specialize how a program is parallelized depending on the target. Fully exploiting a highly parallel accelerator, such as a GPU, demands more parallelism and sometimes more levels of parallelism than a multicore CPU. OpenMP has a directive for each level of parallelism, but choosing directives for each target can incur a significant productivity cost. We argue that using the new OpenMP loop directive with an appropriate compiler decision process can achieve the same performance benefits of target-specific parallelization with the productivity advantage of a single directive for all targets. In this paper, we introduce a fully descriptive model and demonstrate its benefits with an implementation of the loop directive, comparing performance, productivity, and portability against other production compilers using the SPEC ACCEL benchmark suite. We provide an implementation of our proposal in NVIDIA's HPC compiler. It yields up to 56X speedup and an average of 1.91x-1.79x speedup compared to the baseline performance (depending on the host system) on GPUs, and preserves CPU performance. In addition, our proposal requires 60% fewer parallelism directives.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121670401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Writing and verifying a Quantum optimizing compiler (keynote) 编写和验证量子优化编译器(主题演讲)

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3526941

Robert Rand

{"title":"Writing and verifying a Quantum optimizing compiler (keynote)","authors":"Robert Rand","doi":"10.1145/3497776.3526941","DOIUrl":"https://doi.org/10.1145/3497776.3526941","url":null,"abstract":"As quantum computing hardware evolves, it will continue to face four key limitations: low qubit counts, limited connectivity, high error rates, and short coherence times. Quantum compilers play a key role in addressing these issues, reducing the number of qubits needed to perform a computation, mapping those qubits to the desired hardware, and minimizing the number of costly operations, both in terms of error rates and execution time. However, we cannot afford for compilers to become another source of bugs: Quantum computing is an inherently probabilistic and error-prone process and any additional sources of error are unlikely to be properly diagnosed. To address this, we present VOQC, a verified optimizing compiler for quantum circuits. VOQC heavily optimizes quantum programs while guaranteeing that the output is quantum-mechanically indistinguishable from the input program, up to permutation of qubits. This ensures that compilation produces an equivalent program that is executable on the given hardware. In this talk, we will address the key differences between classical and quantum compilation and the challenges unique to the latter. We will discuss the design decisions that underlie VOQC and how they enable its most powerful optimizations. Finally, we will discuss the developments since VOQC was first published, both within the VOQC toolchain and competing compilers, verified and unverified.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132345838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient profile-guided size optimization for native mobile applications 高效的配置文件引导大小优化原生移动应用程序

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517764

Kyungwoon Lee, Ellis Hoag, N. Tillmann

{"title":"Efficient profile-guided size optimization for native mobile applications","authors":"Kyungwoon Lee, Ellis Hoag, N. Tillmann","doi":"10.1145/3497776.3517764","DOIUrl":"https://doi.org/10.1145/3497776.3517764","url":null,"abstract":"Positive user experience of mobile apps demands they not only launch fast and run fluidly, but are also small in order to reduce network bandwidth from regular updates. Conventional optimizations often trade off size regressions for performance wins, making them impractical in the mobile space. Indeed, profile-guided optimization (PGO) is successful in server workloads, but is not effective at reducing size and page faults for mobile apps. Also, profiles must be collected from instrumenting builds that are up to 2X larger, so they cannot run normally on real mobile devices. In this paper, we first introduce Machine IR Profile (MIP), a lightweight instrumentation that runs at the machine IR level. Unlike the existing LLVM IR instrumentation counterpart, MIP withholds static metadata from the instrumenting binaries leading to a 2/3 reduction in size overhead. In addition, MIP collects profile data that is more relevant to optimizations in the mobile space. Then we propose three improvements to the LLVM machine outliner: (i) the global outliner overcomes the local scope of the machine outliner when using ThinLTO, (ii) the frame outliner effectively outlines irregular prologues and epilogues, and (iii) the custom outliner outlines frequent patterns occurring in Objective-C and Swift. Lastly, we present our PGO that orders hot start-up functions to minimize page faults, and controls the size optimization level (-Os vs -Oz) for functions based on their estimated execution time driven from MIP. We also order cold functions based on similarity to minimize the compressed app size. Our work improves both the size and performance of real-world mobile apps when compared to the MinSize (-Oz) optimization level: (i) in SocialApp, we reduced the compressed app size by 5.2%, the uncompressed app size by 9.6% and the page faults by 20.6%, and (ii) in ChatApp, we reduced them by 2.4%, 4.6% and 36.4%, respectively.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127360149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A polynomial time exact solution to the bit-aware register binding problem 一个多项式时间精确解决位感知寄存器绑定问题

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517773

Michael Canesche, Ricardo Ferreira, J. Nacif, Fernando Magno Quintão Pereira

引用次数: 0

Making no-fuss compiler fuzzing effective 使简单的编译器模糊测试有效

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517765

Alex Groce, Rijnard van Tonder, G. Kalburgi, Claire Le Goues

{"title":"Making no-fuss compiler fuzzing effective","authors":"Alex Groce, Rijnard van Tonder, G. Kalburgi, Claire Le Goues","doi":"10.1145/3497776.3517765","DOIUrl":"https://doi.org/10.1145/3497776.3517765","url":null,"abstract":"Developing a bug-free compiler is difficult; modern optimizing compilers are among the most complex software systems humans build. Fuzzing is one way to identify subtle compiler bugs that are hard to find with human-constructed tests. Grammar-based fuzzing, however, requires a grammar for a compiler’s input language, and can miss bugs induced by code that does not actually satisfy the grammar the compiler should accept. Grammar-based fuzzing also seldom uses advanced modern fuzzing techniques based on coverage feedback. However, modern mutation-based fuzzers are often ineffective for testing compilers because most inputs they generate do not even come close to getting past the parsing stage of compilation. This paper introduces a technique for taking a modern mutation-based fuzzer (AFL in our case, but the method is general) and augmenting it with operators taken from mutation testing, and program splicing. We conduct a controlled study to show that our hybrid approaches significantly improve fuzzing effectiveness qualitatively (consistently finding unique bugs that baseline approaches do not) and quantitatively (typically finding more unique bugs in the same time span, despite fewer program executions). Our easy-to-apply approach has allowed us to report more than 100 confirmed and fixed bugs in production compilers, and found a bug in the Solidity compiler that earned a security bounty.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124170479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Seamless deductive inference via macros 通过宏进行无缝演绎推理

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517779

Arash Sahebolamri, Thomas Gilray, Kristopher K. Micinski

{"title":"Seamless deductive inference via macros","authors":"Arash Sahebolamri, Thomas Gilray, Kristopher K. Micinski","doi":"10.1145/3497776.3517779","DOIUrl":"https://doi.org/10.1145/3497776.3517779","url":null,"abstract":"We present an approach to integrating state-of-art bottom-up logic programming within the Rust ecosystem, demonstrating it with Ascent, an extension of Datalog that performs well against comparable systems. Rust’s powerful macro system permits Ascent to be compiled uniformly with the Rust code it’s embedded in and to interoperate with arbitrary user-defined components written in Rust, addressing a challenge in real-world use of logic programming languages: the fact that logical programs are parts of bigger software systems and need to interoperate with other components written in imperative programming languages. We leverage Rust’s trait system to extend Datalog semantics with non-powerset lattices, much like Flix, and with user-defined data types much like Formulog and Souffle. We use Ascent to re-implement the Rust borrow checker, a static analysis required by the Rust compiler. We evaluate our performance against Datafrog, Flix, and Soufflé using the borrow checker and other benchmarks, observing comparable performance to Datafrog and Soufflé, and speedups of around two orders of magnitude compared to Flix.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132463880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

BinPointer: towards precise, sound, and scalable binary-level pointer analysis BinPointer:精确的，健全的，可扩展的二进制级指针分析

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517776

Sun Hyoung Kim, Dongrui Zeng, Cong Sun, Gang Tan

引用次数: 2

On the computation of interprocedural weak control closure 程序间弱控制闭包的计算

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517782

A. Masud, B. Lisper

{"title":"On the computation of interprocedural weak control closure","authors":"A. Masud, B. Lisper","doi":"10.1145/3497776.3517782","DOIUrl":"https://doi.org/10.1145/3497776.3517782","url":null,"abstract":"Many program analysis techniques depend on capturing the control dependencies of the program. Most existing control dependence algorithms either compute intraprocedural control dependencies only, or they compute control dependence relations that are not precise in general including nonterminating systems. Weak control closure (WCC) subsumes all known nontermination insensitive control dependence relations, including those that are appropriate for nonterminating systems. In this paper, we provide the first formal development of an algorithm to compute the WCC for interprocedural programs capturing the weak form of interprocedural control dependencies. The method is widely applicable due to the generality of WCC. Theorems on the theoretical results of soundness, precision, and the worst-case complexity of our method are also included. We have compared our algorithm with a WCC computation algorithm based on a state-of-the-art interprocedural control dependence computation algorithm. The latter algorithm loses soundness, and we improve the precision by 15.21% on all our experimental benchmarks. This empirical evidence suggests that our algorithm is more effective for any client application of WCC requiring interprocedural program analysis.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"25 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Software pre-execution for irregular memory accesses in the HBM era HBM时代不规则内存访问的软件预执行

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI: 10.1145/3497776.3517783

Sanyam Mehta, G. Elsesser, Terry Greyzck

{"title":"Software pre-execution for irregular memory accesses in the HBM era","authors":"Sanyam Mehta, G. Elsesser, Terry Greyzck","doi":"10.1145/3497776.3517783","DOIUrl":"https://doi.org/10.1145/3497776.3517783","url":null,"abstract":"The introduction of High Bandwidth Memory (HBM) necessitates the use of intelligent software prefetching in irregular applications to utilize the surplus bandwidth. In this work, we propose Software Pre-execution (SPE), a technique that relies on pre-executing a minimal copy of the loop of concern (we call the pre-execution loop) for the purpose of prefetching irregular accesses. This is complemented by the compiler's enforcing a certain prefetch distance through apriori strip-mining of the original loop such that the execution of the pre-execution loop is interspersed with the main loop to ensure timeliness of prefetches. We find that this approach provides natural advantages over prior art such as preservation of loop vectorization, handling short loops, avoiding performance bottlenecks, amenability to threading and most importantly, effective coverage. We demonstrate these advantages using a variety of benchmarks on Fujitsu's A64FX processor with HBM2 memory - we outperform prior art by 1.3x and 1.2x when using small and huge pages, respectively. Simulations further show that our approach holds stronger promise on upcoming processors with HBM2e.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1