2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)最新文献_第2页

SCHEMATIC: Compile-Time Checkpoint Placement and Memory Allocation for Intermittent Systems SCHEMATIC：间歇系统的编译时检查点放置和内存分配

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444789

Hugo Reymond, Jean-Luc Béchennec, M. Briday, Sébastien Faucou, Isabelle Puaut, Erven Rohou

{"title":"SCHEMATIC: Compile-Time Checkpoint Placement and Memory Allocation for Intermittent Systems","authors":"Hugo Reymond, Jean-Luc Béchennec, M. Briday, Sébastien Faucou, Isabelle Puaut, Erven Rohou","doi":"10.1109/CGO57630.2024.10444789","DOIUrl":"https://doi.org/10.1109/CGO57630.2024.10444789","url":null,"abstract":"Battery-free devices enable sensing in hard-to-access locations, opening up new opportunities in various fields such as healthcare, space, or civil engineering. Such devices harvest ambient energy and store it in a capacitor. Due to the unpredictable nature of the harvested energy, a power failure can occur at any time, resulting in a loss of all non-persistent information (e.g., processor registers, data stored in volatile memory). Checkpointing volatile data in non-volatile memory allows the system to recover after a power failure, but raises two issues: (i) spatial and temporal placement of checkpoints; (ii) memory allocation of variables between volatile and non-volatile memory, with the overall objective of using energy as efficiently as possible. While many techniques rely on the developer to address these issues, we present Schematic,a compiler technique that automates checkpoint placement and memory allocation to minimize the overall energy consumption. Schematicensures that programs will eventually terminate (forward progress property). Moreover, checkpoint placement and memory allocation adapt to the size of the energy buffer and the capacity of volatile memory. Schematictakes advantage of volatile memory (VM) to reduce the energy consumed, by automatically placing the most used variables in VM. We tested Schematicfor different experimental settings (size of volatile memory and capacitor) and results show an average energy reduction of 51 % compared to related techniques.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"34 3","pages":"258-269"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140285711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CGO 2024 Sponsors and Supporters CGO 2024 赞助商和支持商

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/cgo57630.2024.10444821

引用次数: 0

Retargeting and Respecializing GPU Workloads for Performance Portability 重新定位和重新专用 GPU 工作负载以实现性能可移植性

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444828

Ivan R. Ivanov, O. Zinenko, Jens Domke, Toshio Endo, William S. Moses

{"title":"Retargeting and Respecializing GPU Workloads for Performance Portability","authors":"Ivan R. Ivanov, O. Zinenko, Jens Domke, Toshio Endo, William S. Moses","doi":"10.1109/CGO57630.2024.10444828","DOIUrl":"https://doi.org/10.1109/CGO57630.2024.10444828","url":null,"abstract":"In order to come close to peak performance, accelerators like GPUs require significant architecture-specific tuning that understand the availability of shared memory, parallelism, tensor cores, etc. Unfortunately, the pursuit of higher performance and lower costs have led to a significant diversification of architecture designs, even from the same vendor. This creates the need for performance portability across different GPUs, especially important for programs in a particular programming model with a certain architecture in mind. Even when the program can be seamlessly executed on a different architecture, it may suffer a performance penalty due to it not being sized appropriately to the available hardware resources such as fast memory and registers, let alone not using newer advanced features of the architecture. We propose a new approach to improving performance of (legacy) CUDA programs for modern machines by automatically adjusting the amount of work each parallel thread does, and the amount of memory and register resources it requires. By operating within the MLIR compiler infrastructure, we are able to also target AMD GPUs by performing automatic translation from CUDA and simultaneously adjust the program granularity to fit the size of target GPUs. Combined with autotuning assisted by the platform-specific compiler, our approach demonstrates 27% geomean speedup on the Rodinia benchmark suite over baseline CUDA implementation as well as performance parity between similar NVIDIA and AMD GPUs executing the same CUDA program.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"53 7","pages":"119-132"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Instruction Scheduling for the GPU on the GPU GPU 上的 GPU 指令调度

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444869

Ghassan Shobaki, Pınar Muyan-Özçelik, Josh Hutton, Bruce Linck, Vladislav Malyshenko, Austin Kerbow, Ronaldo Ramirez-Ortega, Vahl Scott Gordon

{"title":"Instruction Scheduling for the GPU on the GPU","authors":"Ghassan Shobaki, Pınar Muyan-Özçelik, Josh Hutton, Bruce Linck, Vladislav Malyshenko, Austin Kerbow, Ronaldo Ramirez-Ortega, Vahl Scott Gordon","doi":"10.1109/CGO57630.2024.10444869","DOIUrl":"https://doi.org/10.1109/CGO57630.2024.10444869","url":null,"abstract":"In this paper, we show how to use the GPU to parallelize a precise instruction scheduling algorithm that is based on Ant Colony Optimization (ACO). ACO is a nature-inspired intelligent-search technique that has been used to compute precise solutions to NP-hard problems in operations research (OR). Such intelligent-search techniques were not used in the past to solve NP-hard compiler optimization problems, because they require substantially more computation than the heuristic techniques used in production compilers. In this work, we show that parallelizing such a compute-intensive technique on the GPU makes using it in compilation reasonably practical. The register-pressure-aware instruction scheduling problem addressed in this work is a multi-objective optimization problem that is significantly more complex than the problems that were previously solved using parallel ACO on the GPU. We describe a number of techniques that we have developed to efficiently parallelize an ACO algorithm for solving this multi-objective optimization problem on the GPU. The target processor is also a GPU. Our experimental evaluation shows that parallel ACO-based scheduling on the GPU runs up to 27 times faster than sequential ACO-based scheduling on the CPU, and this leads to reducing the total compile time of the rocPRIM benchmarks by 21%. ACO-based scheduling improves the execution-speed of the compiled benchmarks by up to 74% relative to AMD's production scheduler. To the best of our knowledge, our work is the first successful attempt to parallelize a compiler optimization algorithm on the GPU.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"65 1","pages":"435-447"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Representing Data Collections in an SSA Form 在 SSA 表格中表示数据收集情况

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444817

Tommy McMichen, Nathan Greiner, Peter Zhong, Federico Sossai, Atmn Patel, Simone Campanoni

{"title":"Representing Data Collections in an SSA Form","authors":"Tommy McMichen, Nathan Greiner, Peter Zhong, Federico Sossai, Atmn Patel, Simone Campanoni","doi":"10.1109/CGO57630.2024.10444817","DOIUrl":"https://doi.org/10.1109/CGO57630.2024.10444817","url":null,"abstract":"Compiler research and development has treated computation as the primary driver of performance improvements in C/C++ programs, leaving memory optimizations as a secondary consideration. Developers are currently handed the arduous task of describing both the semantics and layout of their data in memory, either manually or via libraries, prematurely lowering high-level data collections to a low-level view of memory for the compiler. Thus, the compiler can only glean conservative information about the memory in a program, e.g., alias analysis, and is further hampered by heavy memory optimizations. This paper proposes the Memory Object Intermediate Representation (MEMOIR), a language-agnostic SSA form for sequential and associative data collections, objects, and the fields contained therein. At the core of Memoir is a decoupling of the memory used to store data from that used to logically organize data. Through its SSA form, Memoir compilers can perform element-level analysis on data collections, enabling static analysis on the state of a collection or object at any given program point. To illustrate the power of this analysis, we perform dead element elimination, resulting in a 26.6% speedup on mcf from SPECINT 2017. With the degree of freedom to mutate memory layout, our Memoir compiler performs field elision and dead field elimination, reducing peak memory usage of mcf by 20.8%.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"24 10","pages":"308-321"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140285857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compile-Time Analysis of Compiler Frameworks for Query Compilation 用于查询编译的编译器框架的编译时分析

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444856

Alexis Engelke, Tobias Schwarz

引用次数: 0

CGO 2024 Organization CGO 2024 组织

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/cgo57630.2024.10444881

引用次数: 0

Boosting the Performance of Multi-Solver IFDS Algorithms with Flow-Sensitivity Optimizations 通过流量敏感性优化提升多解器 IFDS 算法的性能

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444884

Haofeng Li, Jie Lu, Haining Meng, Liqing Cao, Lian Li, Lin Gao

{"title":"Boosting the Performance of Multi-Solver IFDS Algorithms with Flow-Sensitivity Optimizations","authors":"Haofeng Li, Jie Lu, Haining Meng, Liqing Cao, Lian Li, Lin Gao","doi":"10.1109/CGO57630.2024.10444884","DOIUrl":"https://doi.org/10.1109/CGO57630.2024.10444884","url":null,"abstract":"The IFDS (Inter-procedural, Finite, Distributive, Subset) algorithms are popularly used to solve a wide range of analysis problems. In particular, many interesting problems are formulated as multi-solver IFDS problems which expect multiple interleaved IFDS solvers to work together. For instance, taint analysis requires two IFDS solvers, one forward solver to propagate tainted data-flow facts, and one backward solver to solve alias relations at the same time. For such problems, large amount of additional data-flow facts need to be introduced for flow-sensitivity. This often leads to poor performance and scalability, as evident in our experiments and previous work. In this paper, we propose a novel approach to reduce the number of introduced additional data-flow facts while preserving flow-sensitivity and soundness. We have developed a new taint analysis tool, SADROID, and evaluated it on 1,228 open-source Android APPs. Evaluation results show that SADROID significantly outperforms FLowDROID (the state-of-the-art multi-solver IFDS taint analysis tool) without affecting precision and soundness: the run time performance is sped up by up to 17.89X and memory usage is optimized by up to 9X.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"62 3","pages":"296-307"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Welcome from the General Chairs 总主席致欢迎辞

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/cgo57630.2024.10444811

引用次数: 0

Revealing Compiler Heuristics Through Automated Discovery and Optimization 通过自动发现和优化揭示编译器启发式方法

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2024-03-02 DOI: 10.1109/CGO57630.2024.10444847

Volker Seeker, Chris Cummins, Murray Cole, Björn Franke, Kim Hazelwood, Hugh Leather

{"title":"Revealing Compiler Heuristics Through Automated Discovery and Optimization","authors":"Volker Seeker, Chris Cummins, Murray Cole, Björn Franke, Kim Hazelwood, Hugh Leather","doi":"10.1109/CGO57630.2024.10444847","DOIUrl":"https://doi.org/10.1109/CGO57630.2024.10444847","url":null,"abstract":"Tuning compiler heuristics and parameters is well known to improve optimization outcomes dramatically. Prior works have tuned command line flags and a few expert identified heuristics. However, there are an unknown number of heuristics buried, unmarked and unexposed inside the compiler as a consequence of decades of development without auto-tuning being foremost in the minds of developers. Many may not even have been considered heuristics by the developers who wrote them. The result is that auto-tuning search and machine learning can optimize only a tiny fraction of what could be possible if all heuristics were available to tune. Manually discovering all of these heuristics hidden among millions of lines of code and exposing them to auto-tuning tools is a Herculean task that is simply not practical. What is needed is a method of automatically finding these heuristics to extract every last drop of potential optimization. In this work, we propose Heureka, a framework that automatically identifies potential heuristics in the compiler that are highly profitable optimization targets and then automatically finds available tuning parameters for those heuristics with minimal human involvement. Our work is based on the following key insight: When modifying the output of a heuristic within an acceptable value range, the calling code using that output will still function correctly and produce semantically correct results. Building on that, we automatically manipulate the output of potential heuristic code in the compiler and decide using a Differential Testing approach if we found a heuristic or not. During output manipulation, we also explore acceptable value ranges of the targeted code. Heuristics identified in this way can then be tuned to optimize an objective function. We used Heureka to search for heuristics among eight thousand functions from the LLVM optimization passes, which is about 2% of all available functions. We then use identified heuristics to tune the compilation of 38 applications from the NAS and Polybench benchmark suites. Compared to an -ozbaseline we reduce binary sizes by up to 11.6% considering single heuristics only and up to 19.5% when stacking the effects of multiple identified tuning targets and applying a random search with minimal search effort. Generalizing from existing analysis results, Heureka needs, on average, a little under an hour on a single machine to identify relevant heuristic targets for a previously unseen application.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"60 6","pages":"55-66"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0