Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction最新文献

筛选
英文 中文
One-shot tuner for deep learning compilers 深度学习编译器的一次性调谐器
Jaehun Ryu, Eunhyeok Park, Hyojin Sung
{"title":"One-shot tuner for deep learning compilers","authors":"Jaehun Ryu, Eunhyeok Park, Hyojin Sung","doi":"10.1145/3497776.3517774","DOIUrl":"https://doi.org/10.1145/3497776.3517774","url":null,"abstract":"Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks. While existing work can generate deep learning models that exceed the performance of hand-tuned libraries, they still suffer from prohibitively long auto-tuning time due to repeated hardware measurements in large search spaces. In this paper, we take a neural-predictor inspired approach to reduce the auto-tuning overhead and show that a performance predictor model trained prior to compilation can produce optimized tensor operation codes without repeated search and hardware measurements. To generate a sample-efficient training dataset, we extend input representation to include task-specific information and to guide data sampling methods to focus on learning high-performing codes. We evaluated the resulting predictor model, One-Shot Tuner, against AutoTVM and other prior work, and the results show that One-Shot Tuner speeds up compilation by 2.81x to 67.7x compared to prior work while providing comparable or improved inference time for CNN and Transformer models.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Performant portable OpenMP 高性能的便携式OpenMP
Guray Ozen, M. Wolfe
{"title":"Performant portable OpenMP","authors":"Guray Ozen, M. Wolfe","doi":"10.1145/3497776.3517780","DOIUrl":"https://doi.org/10.1145/3497776.3517780","url":null,"abstract":"Accelerated computing has increased the need to specialize how a program is parallelized depending on the target. Fully exploiting a highly parallel accelerator, such as a GPU, demands more parallelism and sometimes more levels of parallelism than a multicore CPU. OpenMP has a directive for each level of parallelism, but choosing directives for each target can incur a significant productivity cost. We argue that using the new OpenMP loop directive with an appropriate compiler decision process can achieve the same performance benefits of target-specific parallelization with the productivity advantage of a single directive for all targets. In this paper, we introduce a fully descriptive model and demonstrate its benefits with an implementation of the loop directive, comparing performance, productivity, and portability against other production compilers using the SPEC ACCEL benchmark suite. We provide an implementation of our proposal in NVIDIA's HPC compiler. It yields up to 56X speedup and an average of 1.91x-1.79x speedup compared to the baseline performance (depending on the host system) on GPUs, and preserves CPU performance. In addition, our proposal requires 60% fewer parallelism directives.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121670401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Writing and verifying a Quantum optimizing compiler (keynote) 编写和验证量子优化编译器(主题演讲)
Robert Rand
{"title":"Writing and verifying a Quantum optimizing compiler (keynote)","authors":"Robert Rand","doi":"10.1145/3497776.3526941","DOIUrl":"https://doi.org/10.1145/3497776.3526941","url":null,"abstract":"As quantum computing hardware evolves, it will continue to face four key limitations: low qubit counts, limited connectivity, high error rates, and short coherence times. Quantum compilers play a key role in addressing these issues, reducing the number of qubits needed to perform a computation, mapping those qubits to the desired hardware, and minimizing the number of costly operations, both in terms of error rates and execution time. However, we cannot afford for compilers to become another source of bugs: Quantum computing is an inherently probabilistic and error-prone process and any additional sources of error are unlikely to be properly diagnosed. To address this, we present VOQC, a verified optimizing compiler for quantum circuits. VOQC heavily optimizes quantum programs while guaranteeing that the output is quantum-mechanically indistinguishable from the input program, up to permutation of qubits. This ensures that compilation produces an equivalent program that is executable on the given hardware. In this talk, we will address the key differences between classical and quantum compilation and the challenges unique to the latter. We will discuss the design decisions that underlie VOQC and how they enable its most powerful optimizations. Finally, we will discuss the developments since VOQC was first published, both within the VOQC toolchain and competing compilers, verified and unverified.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132345838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient profile-guided size optimization for native mobile applications 高效的配置文件引导大小优化原生移动应用程序
Kyungwoon Lee, Ellis Hoag, N. Tillmann
{"title":"Efficient profile-guided size optimization for native mobile applications","authors":"Kyungwoon Lee, Ellis Hoag, N. Tillmann","doi":"10.1145/3497776.3517764","DOIUrl":"https://doi.org/10.1145/3497776.3517764","url":null,"abstract":"Positive user experience of mobile apps demands they not only launch fast and run fluidly, but are also small in order to reduce network bandwidth from regular updates. Conventional optimizations often trade off size regressions for performance wins, making them impractical in the mobile space. Indeed, profile-guided optimization (PGO) is successful in server workloads, but is not effective at reducing size and page faults for mobile apps. Also, profiles must be collected from instrumenting builds that are up to 2X larger, so they cannot run normally on real mobile devices. In this paper, we first introduce Machine IR Profile (MIP), a lightweight instrumentation that runs at the machine IR level. Unlike the existing LLVM IR instrumentation counterpart, MIP withholds static metadata from the instrumenting binaries leading to a 2/3 reduction in size overhead. In addition, MIP collects profile data that is more relevant to optimizations in the mobile space. Then we propose three improvements to the LLVM machine outliner: (i) the global outliner overcomes the local scope of the machine outliner when using ThinLTO, (ii) the frame outliner effectively outlines irregular prologues and epilogues, and (iii) the custom outliner outlines frequent patterns occurring in Objective-C and Swift. Lastly, we present our PGO that orders hot start-up functions to minimize page faults, and controls the size optimization level (-Os vs -Oz) for functions based on their estimated execution time driven from MIP. We also order cold functions based on similarity to minimize the compressed app size. Our work improves both the size and performance of real-world mobile apps when compared to the MinSize (-Oz) optimization level: (i) in SocialApp, we reduced the compressed app size by 5.2%, the uncompressed app size by 9.6% and the page faults by 20.6%, and (ii) in ChatApp, we reduced them by 2.4%, 4.6% and 36.4%, respectively.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127360149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A polynomial time exact solution to the bit-aware register binding problem 一个多项式时间精确解决位感知寄存器绑定问题
Michael Canesche, Ricardo Ferreira, J. Nacif, Fernando Magno Quintão Pereira
{"title":"A polynomial time exact solution to the bit-aware register binding problem","authors":"Michael Canesche, Ricardo Ferreira, J. Nacif, Fernando Magno Quintão Pereira","doi":"10.1145/3497776.3517773","DOIUrl":"https://doi.org/10.1145/3497776.3517773","url":null,"abstract":"Finding the minimum register bank is an optimization problem related to the synthesis of hardware. Given a program, the problem asks for the minimum number of registers plus their minimum size, in bits, that suffices to compile said program. This problem is NP-complete; hence, usually solved via heuristics. In this paper, we show that this problem has an optimal solution in polynomial time, as long as swaps can be inserted in the program to move variables across registers. This observation sets a lower bound to heuristics that minimize the size of register banks. We have compared the optimal algorithm with two classic heuristics. Our approach uses, on average, 6 to 10% less bits than that previous work.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126594720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making no-fuss compiler fuzzing effective 使简单的编译器模糊测试有效
Alex Groce, Rijnard van Tonder, G. Kalburgi, Claire Le Goues
{"title":"Making no-fuss compiler fuzzing effective","authors":"Alex Groce, Rijnard van Tonder, G. Kalburgi, Claire Le Goues","doi":"10.1145/3497776.3517765","DOIUrl":"https://doi.org/10.1145/3497776.3517765","url":null,"abstract":"Developing a bug-free compiler is difficult; modern optimizing compilers are among the most complex software systems humans build. Fuzzing is one way to identify subtle compiler bugs that are hard to find with human-constructed tests. Grammar-based fuzzing, however, requires a grammar for a compiler’s input language, and can miss bugs induced by code that does not actually satisfy the grammar the compiler should accept. Grammar-based fuzzing also seldom uses advanced modern fuzzing techniques based on coverage feedback. However, modern mutation-based fuzzers are often ineffective for testing compilers because most inputs they generate do not even come close to getting past the parsing stage of compilation. This paper introduces a technique for taking a modern mutation-based fuzzer (AFL in our case, but the method is general) and augmenting it with operators taken from mutation testing, and program splicing. We conduct a controlled study to show that our hybrid approaches significantly improve fuzzing effectiveness qualitatively (consistently finding unique bugs that baseline approaches do not) and quantitatively (typically finding more unique bugs in the same time span, despite fewer program executions). Our easy-to-apply approach has allowed us to report more than 100 confirmed and fixed bugs in production compilers, and found a bug in the Solidity compiler that earned a security bounty.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124170479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Seamless deductive inference via macros 通过宏进行无缝演绎推理
Arash Sahebolamri, Thomas Gilray, Kristopher K. Micinski
{"title":"Seamless deductive inference via macros","authors":"Arash Sahebolamri, Thomas Gilray, Kristopher K. Micinski","doi":"10.1145/3497776.3517779","DOIUrl":"https://doi.org/10.1145/3497776.3517779","url":null,"abstract":"We present an approach to integrating state-of-art bottom-up logic programming within the Rust ecosystem, demonstrating it with Ascent, an extension of Datalog that performs well against comparable systems. Rust’s powerful macro system permits Ascent to be compiled uniformly with the Rust code it’s embedded in and to interoperate with arbitrary user-defined components written in Rust, addressing a challenge in real-world use of logic programming languages: the fact that logical programs are parts of bigger software systems and need to interoperate with other components written in imperative programming languages. We leverage Rust’s trait system to extend Datalog semantics with non-powerset lattices, much like Flix, and with user-defined data types much like Formulog and Souffle. We use Ascent to re-implement the Rust borrow checker, a static analysis required by the Rust compiler. We evaluate our performance against Datafrog, Flix, and Soufflé using the borrow checker and other benchmarks, observing comparable performance to Datafrog and Soufflé, and speedups of around two orders of magnitude compared to Flix.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132463880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
BinPointer: towards precise, sound, and scalable binary-level pointer analysis BinPointer:精确的,健全的,可扩展的二进制级指针分析
Sun Hyoung Kim, Dongrui Zeng, Cong Sun, Gang Tan
{"title":"BinPointer: towards precise, sound, and scalable binary-level pointer analysis","authors":"Sun Hyoung Kim, Dongrui Zeng, Cong Sun, Gang Tan","doi":"10.1145/3497776.3517776","DOIUrl":"https://doi.org/10.1145/3497776.3517776","url":null,"abstract":"Binary-level pointer analysis is critical to binary-level applications such as reverse engineering and binary debloating. In this paper, we propose BinPointer, a new binary-level interprocedural pointer analysis that relies on an offset-sensitive value-tracking analysis to achieve high precision. We also propose a soundness and precision evaluation methodology based on runtime memory accesses triggered by reference input data. Our experimental results demonstrate that BinPointer has higher precision over prior work, while maintaining acceptable scalability. The soundness of BinPointer is also validated through runtime data.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115446009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the computation of interprocedural weak control closure 程序间弱控制闭包的计算
A. Masud, B. Lisper
{"title":"On the computation of interprocedural weak control closure","authors":"A. Masud, B. Lisper","doi":"10.1145/3497776.3517782","DOIUrl":"https://doi.org/10.1145/3497776.3517782","url":null,"abstract":"Many program analysis techniques depend on capturing the control dependencies of the program. Most existing control dependence algorithms either compute intraprocedural control dependencies only, or they compute control dependence relations that are not precise in general including nonterminating systems. Weak control closure (WCC) subsumes all known nontermination insensitive control dependence relations, including those that are appropriate for nonterminating systems. In this paper, we provide the first formal development of an algorithm to compute the WCC for interprocedural programs capturing the weak form of interprocedural control dependencies. The method is widely applicable due to the generality of WCC. Theorems on the theoretical results of soundness, precision, and the worst-case complexity of our method are also included. We have compared our algorithm with a WCC computation algorithm based on a state-of-the-art interprocedural control dependence computation algorithm. The latter algorithm loses soundness, and we improve the precision by 15.21% on all our experimental benchmarks. This empirical evidence suggests that our algorithm is more effective for any client application of WCC requiring interprocedural program analysis.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"25 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127651722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Software pre-execution for irregular memory accesses in the HBM era HBM时代不规则内存访问的软件预执行
Sanyam Mehta, G. Elsesser, Terry Greyzck
{"title":"Software pre-execution for irregular memory accesses in the HBM era","authors":"Sanyam Mehta, G. Elsesser, Terry Greyzck","doi":"10.1145/3497776.3517783","DOIUrl":"https://doi.org/10.1145/3497776.3517783","url":null,"abstract":"The introduction of High Bandwidth Memory (HBM) necessitates the use of intelligent software prefetching in irregular applications to utilize the surplus bandwidth. In this work, we propose Software Pre-execution (SPE), a technique that relies on pre-executing a minimal copy of the loop of concern (we call the pre-execution loop) for the purpose of prefetching irregular accesses. This is complemented by the compiler's enforcing a certain prefetch distance through apriori strip-mining of the original loop such that the execution of the pre-execution loop is interspersed with the main loop to ensure timeliness of prefetches. We find that this approach provides natural advantages over prior art such as preservation of loop vectorization, handling short loops, avoiding performance bottlenecks, amenability to threading and most importantly, effective coverage. We demonstrate these advantages using a variety of benchmarks on Fujitsu's A64FX processor with HBM2 memory - we outperform prior art by 1.3x and 1.2x when using small and huge pages, respectively. Simulations further show that our approach holds stronger promise on upcoming processors with HBM2e.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信