International Symposium on Code Generation and Optimization, 2003. CGO 2003.最新文献_第3页

Optimization opportunities created by global data reordering 全局数据重新排序带来的优化机会

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191548

Gadi Haber, M. Klausner, Vadim Eisenberg, Bilha Mendelson, M. Gurevich

{"title":"Optimization opportunities created by global data reordering","authors":"Gadi Haber, M. Klausner, Vadim Eisenberg, Bilha Mendelson, M. Gurevich","doi":"10.1109/CGO.2003.1191548","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191548","url":null,"abstract":"Memory access has proven to be one of the bottlenecks in modern architectures. Improving memory locality and eliminating the amount of memory access can help release this bottleneck. We present a method for link-time profile-based optimization by reordering the global data of the program and modifying its code accordingly. The proposed optimization reorders the entire global data of the program, according to a representative execution rate of each instruction (or basic block) in the code. The data reordering is done in a way that enables the replacement of frequently-executed Load instructions, which reference the global data, with fast Add Immediate instructions. In addition, it tries to improve the global data locality and to reduce the total size of the global data area. The optimization was implemented into FDPR (Feedback Directed Program Restructuring), a post-link optimizer, which is part of the IBM AIX operating system for the IBM pSeries servers. Our results on SPECint2000 show a significant improvement of up to 11% (average 3%) in execution time, along with up to 97.9% (average 83%) reduction in memory references to the global variables via the global data access mechanism of the program.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127230209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Inlining of mathematical functions in HP-UX for Itanium/sup /spl reg// 2 HP-UX中数学函数的内联/sup /spl reg// 2

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191540

James W. Thomas

引用次数: 2

Phi-predication for light-weight if-conversion 轻量级if转换的phi预测

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191544

Weihaw Chuang, B. Calder, J. Ferrante

{"title":"Phi-predication for light-weight if-conversion","authors":"Weihaw Chuang, B. Calder, J. Ferrante","doi":"10.1109/CGO.2003.1191544","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191544","url":null,"abstract":"Predicated execution can eliminate hard to predict branches and help to enable instruction level parallelism. Many current predication variants exist where the result update is conditional based upon the outcome of the guarding predicate. However conditional writing of a register creates a naming problem for an out-of-order processor and can stall the issuing of instructions. This problem arises from potential multiple predicated definitions reaching a use, which is unresolved until the prior predicate values are computed. We focus on a light-weight form of predication, phi-predication, where all predicated instructions write a result value to their register regardless of the predicate value (i.e. even if it is false). Therefore, the predicate does not guard the writing of the result register; it instead acts as a form of selection between two input registers. This eliminates the naming problem for an out-of-order processor. Our phi-predicated ISA is derived from the predicated features of the Multiflow ISA, with extensions to efficiently predicate complex control flow. Our compiler modifications also expand upon prior techniques to provide efficient code generation. We examine the use of phi-predication for an in-order and out-of-order architecture and compare its performance to using select-op and IA64 ISA predication.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116504623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Dynamic binary translation for accumulator-oriented architectures 面向累加器架构的动态二进制翻译

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191530

Ho-Seop Kim, James E. Smith

{"title":"Dynamic binary translation for accumulator-oriented architectures","authors":"Ho-Seop Kim, James E. Smith","doi":"10.1109/CGO.2003.1191530","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191530","url":null,"abstract":"A dynamic binary translation system for a co-designed virtual machine is described and evaluated. The underlying hardware directly executes an accumulator-oriented instruction set that exposes instruction dependence chains (strands) to a distributed microarchitecture containing a simple instruction pipeline. To support conventional program binaries, a source instruction set (Alpha in our study) is dynamically translated to the target accumulator instruction set. The binary translator identifies chains of inter-instruction dependences and assigns them to dependence-carrying accumulators. Because the underlying superscalar microarchitecture is capable of dynamic instruction scheduling, the binary translation system does not perform aggressive optimizations or re-schedule code; this significantly reduces binary translation overhead. Detailed timing simulation of the dynamically translated code running on an accumulator-based distributed microarchitecture shows the overall system is capable of achieving similar performance to an ideal out-of-order superscalar processor, ignoring the significant clock frequency advantages that the accumulator-based hardware is likely to have. As part of the study, we evaluate an instruction set modification that simplifies precise trap implementation. This approach significantly reduces the number of instructions required for register state copying, thereby improving performance. We also observe that translation chaining methods can have substantial impact on the performance, and we evaluate a number of chaining methods.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127979106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Addressing mode selection 寻址模式选择

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191557

E. Eckstein, Bernhard Scholz

{"title":"Addressing mode selection","authors":"E. Eckstein, Bernhard Scholz","doi":"10.1109/CGO.2003.1191557","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191557","url":null,"abstract":"Many processor architectures provide a set of addressing modes in their address generation units. For example DSP (digital signal processors) have powerful addressing modes for efficiently implementing numerical algorithms. Typical addressing modes of DSP are auto post-modification and indexing for address registers. The selection of the optimal addressing modes in the means of minimal code size and minimal execution time depends on many parameters and is NP complete in general. In this work we present a new approach for solving the addressing mode selection (AMS) problem. We provide a method for modeling the target architecture's addressing modes as cost functions for a partitioned Boolean quadratic optimization problem (PBQP). For solving the PBQP we present an efficient and effective way to implement large matrices for modeling the cost model. We have integrated the addressing mode selection with the Atair C-Compiler for the uPD7705x DSP from NEC. In our experiments we show that the addressing mode selection can be optimally solved for almost all benchmark programs and the compile-time overhead of the address mode selection is within acceptable bounds for a production DSP compiler.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128109582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Predicate-aware scheduling: a technique for reducing resource constraints 谓词感知调度:一种减少资源约束的技术

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191543

M. Smelyanskiy, S. Mahlke, E. Davidson, H. Lee

{"title":"Predicate-aware scheduling: a technique for reducing resource constraints","authors":"M. Smelyanskiy, S. Mahlke, E. Davidson, H. Lee","doi":"10.1109/CGO.2003.1191543","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191543","url":null,"abstract":"Predicated execution enables the removal of branches wherein segments of branching code are converted into straight-line segments of conditional operations. An important, but generally ignored side effect of this transformation is that the compiler must assign distinct resources to all the predicated operations at a given time to ensure that those resources are available at run-time. However, a resource is only put to productive use when the predicates associated with its operations evaluate to True. We propose predicate-aware scheduling to reduce the superfluous commitment of resources to operations whose predicates evaluate to False at run-time. The central idea is to assign multiple operations to the same resource at the same time, thereby oversubscribing its use. This assignment is intelligently performed to ensure that no two operations simultaneously assigned to the same resource will have both of their predicates evaluate to True. Thus, no resource is dynamically oversubscribed. The overall effect of predicate aware scheduling is to use resources more efficiently, thereby increasing performance when resource constraints are a bottleneck.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132501088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Dynamic trace selection using performance monitoring hardware sampling 使用性能监控硬件采样的动态跟踪选择

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191535

Howard Chen, W. Hsu, Dong-yuan Chen

引用次数: 48

Optimizations to prevent cache penalties for the Intel/spl reg/ Itanium/spl reg/ 2 processor 针对Intel/spl reg/ Itanium/spl reg/ 2处理器进行了防止缓存惩罚的优化

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191537

J. Collard, Daniel M. Lavery

引用次数: 4

Design, implementation and evaluation of adaptive recompilation with on-stack replacement 栈上替换自适应重编译的设计、实现和评估

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191549

Stephen J. Fink, Feng Qian

引用次数: 150

International Symposium on Code Generation and Optimization. CGO 2003 代码生成与优化国际研讨会。CGO 2003

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 1900-01-01 DOI: 10.1109/CGO.2003.1191528

Cgo

引用次数: 32