International Symposium on Code Generation and Optimization, 2003. CGO 2003.最新文献

筛选
英文 中文
Coupling on-line and off-line profile information to improve program performance 耦合在线和离线概要信息以提高程序性能
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191534
C. Krintz
{"title":"Coupling on-line and off-line profile information to improve program performance","authors":"C. Krintz","doi":"10.1109/CGO.2003.1191534","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191534","url":null,"abstract":"In this paper we describe a novel execution environment for Java programs that substantially improves execution performance by incorporating both on-line and off-line profile information to guide dynamic optimization. By using both types of profile collection techniques, we are able to exploit the strengths of each constituent approach: profile accuracy and low overhead. Such coupling also reduces the negative impact of these approaches when each is used in isolation. On-line profiling introduces overhead for dynamic instrumentation, measurement, and decision making. Off-line profile information can be inaccurate when program inputs for execution and optimization differ from those used for profiling. To combat these drawbacks and to achieve the benefits from both online and off-line profiling, we developed a dynamic compilation system (based on JikesRVM) that makes use of both. As a result, we are able improve Java program performance by 9% on average, for the programs studied.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117199209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
An infrastructure for adaptive dynamic optimization 一种自适应动态优化的基础结构
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191551
Derek Bruening, Timothy Garnett, Saman P. Amarasinghe
{"title":"An infrastructure for adaptive dynamic optimization","authors":"Derek Bruening, Timothy Garnett, Saman P. Amarasinghe","doi":"10.1109/CGO.2003.1191551","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191551","url":null,"abstract":"Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework for implementing dynamic analyses and optimizations. We provide an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system. This interface abstracts away many low-level details of the DynamoRIO runtime system while exposing a simple and powerful, yet efficient and lightweight API. This is achieved by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions. The interface is not restricted to optimization and can be used for instrumentation, profiling, dynamic translation, etc. To demonstrate the usefulness and effectiveness of our framework, we implemented several optimizations. These improve the performance of some applications by as much as 40% relative to native execution. The average speedup relative to base DynamoRIO performance is 12%.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126599089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 571
Improving quasi-dynamic schedules through region slip 通过区域滑移改进准动态调度
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191541
Francesco Spadini, Brian Fahs, Sanjay J. Patel, S. Lumetta
{"title":"Improving quasi-dynamic schedules through region slip","authors":"Francesco Spadini, Brian Fahs, Sanjay J. Patel, S. Lumetta","doi":"10.1109/CGO.2003.1191541","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191541","url":null,"abstract":"Modern processors perform dynamic scheduling to achieve better utilization of execution resources. A schedule created at run-time is often better than one created at compile-time as it can dynamically adapt to specific events encountered at execution-time. In this paper, we examine some fundamental impediments to effective static scheduling. More specifically, we examine the question of why schedules generated quasi-dynamically by a low-level runtime optimizer and executed on a statically scheduled machine perform worse than using a dynamically-scheduled approach. We observe that such schedules suffer because of region boundaries and a skewed distribution of parallelism towards the beginning of a region. To overcome these limitations, we investigate a new concept, region slip, in which the schedules of different statically-scheduled regions can be interleaved in the processor issue queue to reduce the region boundary effects that cause empty issue slots.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121878095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Compiler optimization-space exploration 编译器优化-空间探索
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191546
Spyridon Triantafyllis, Manish Vachharajani, David I. August
{"title":"Compiler optimization-space exploration","authors":"Spyridon Triantafyllis, Manish Vachharajani, David I. August","doi":"10.1109/CGO.2003.1191546","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191546","url":null,"abstract":"To meet the demands of modern architectures, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may often degrade performance or interfere with subsequent transformations, compilers employ predictive heuristics to guide optimizations by predicting their effects a priori. Unfortunately, the unpredictability of optimization interaction and the irregularity of today's wide-issue machines severely limit the accuracy of these heuristics. As a result, compiler writers may temper high variance optimizations with overly conservative heuristics or may exclude these optimizations entirely. While this process results in a compiler capable of generating good average code quality across the target benchmark set, it is at the cost of missed optimization opportunities in individual code segments. To replace predictive heuristics, researchers have proposed compilers which explore many optimization options, selecting the best one a posteriori. Unfortunately, these existing iterative compilation techniques are not practical for reasons of compile time and applicability. We present the Optimization-Space Exploration (OSE) compiler organization, the first practical iterative compilation strategy applicable to optimizations in general-purpose compilers. Instead of replacing predictive heuristics, OSE uses the compiler writer's knowledge encoded in the heuristics to select a small number of promising optimization alternatives for a given code segment. Compile time is limited by evaluating only these alternatives for hot code segments using a general compile-time performance estimator An OSE-enhanced version of Intel's highly-tuned, aggressively optimizing production compiler for IA-64 yields a significant performance improvement, more than 20% in some cases, on Itanium for SPEC codes.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132469846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 269
The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges Transmeta代码变形/spl交易/软件:使用推测、恢复和自适应重新翻译来解决现实生活中的挑战
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191529
James C. Dehnert, Brian K. Grant, John Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, Jim Mattson
{"title":"The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges","authors":"James C. Dehnert, Brian K. Grant, John Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, Jim Mattson","doi":"10.1109/CGO.2003.1191529","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191529","url":null,"abstract":"Transmeta's Crusoe microprocessor is a full, system-level implementation of the x86 architecture, comprising a native VLIW microprocessor with a software layer, the Code Morphing Software (CMS), that combines an interpreter, dynamic binary translator, optimizer, and run-time system. In its general structure, CMS resembles other binary translation systems described in the literature, but it is unique in several respects. The wide range of PC workloads that CMS must handle gracefully in real-life operation, plus the need for full system-level x86 compatibility, expose several issues that have received little or no attention in previous literature, such as exceptions and interrupts, I/O, DMA, and self-modifying code. In this paper we discuss some of the challenges raised by these issues, and present the techniques developed in Crusoe and CMS to meet those challenges. The key to these solutions is the Crusoe paradigm of aggressive speculation, recovery to a consistent x86 state using unique hardware commit-and-rollback support, and adaptive retranslation when exceptions occur too often to be handled efficiently by interpretation.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114142362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 306
Speculative register promotion using advanced load address table (ALAT) 使用高级加载地址表(ALAT)的推测寄存器提升
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191539
Jin Lin, Tong Chen, W. Hsu, P. Yew
{"title":"Speculative register promotion using advanced load address table (ALAT)","authors":"Jin Lin, Tong Chen, W. Hsu, P. Yew","doi":"10.1109/CGO.2003.1191539","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191539","url":null,"abstract":"The pervasive use of pointers with complicated patterns in C programs often constrains compiler alias analysis to yield conservative register allocation and promotion. Speculative register promotion with hardware support has the potential to more aggressively promote memory references into registers in the presence of aliases. This paper studies the use of the advanced load address table (ALAT), a data speculation feature defined in the IA-64 architecture, for speculative register promotion. An algorithm for speculative register promotion based on partial redundancy elimination is presented. The algorithm is implemented in Intel's open research compiler (ORC). Experiments on SPEC CPU2000 benchmark programs are conducted to show that speculative register promotion can improve performance of some benchmarks by 1% to 7%.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131835077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack Intel/spl reg/ Itanium/spl reg/架构寄存器栈的优化
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191538
A. Settle, D. Connors, Gerolf Hoflehner, Daniel M. Lavery
{"title":"Optimization for the Intel/spl reg/ Itanium/spl reg/ architecture register stack","authors":"A. Settle, D. Connors, Gerolf Hoflehner, Daniel M. Lavery","doi":"10.1109/CGO.2003.1191538","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191538","url":null,"abstract":"The Intel/spl reg/ Itanium/spl reg/ architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium/spl reg/ architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The Itanium/spl reg/ Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium/spl reg/ architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121862775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adaptive online context-sensitive inlining 自适应在线上下文敏感内联
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191550
K. Hazelwood, D. Grove
{"title":"Adaptive online context-sensitive inlining","authors":"K. Hazelwood, D. Grove","doi":"10.1109/CGO.2003.1191550","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191550","url":null,"abstract":"As current trends in software development move toward more complex object-oriented programming, inlining has become a vital optimization that provides substantial performance improvements to C++ and Java programs. Yet, the aggressiveness of the inlining algorithm must be carefully monitored to effectively balance performance and code size. The state-of-the-art is to use profile information (associated with call edges) to guide inlining decisions. In the presence of virtual method calls, profile information for one call edge may not be sufficient for making effectual inlining decisions. Therefore, we explore the use of profiling data with additional levels of context sensitivity. In addition to exploring fixed levels of context sensitivity, we explore several adaptive schemes that attempt to find the ideal degree of context sensitivity for each call site. Our techniques are evaluated on the basis of runtime performance, code size and dynamic compilation time. On average, we found that with minimal impact on performance (+/-1%) context sensitivity can enable 10% reductions in compiled code space and compile time. Performance on individual programs varied from -4.2% to 5.3% while reductions in compile time and code space of up to 33.0% and 56.7% respectively were obtained.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123793465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Integrated prepass scheduling for a Java just-in-time compiler on the IA-64 IA-64上Java即时编译器的集成预传调度
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191542
T. Inagaki, H. Komatsu, T. Nakatani
{"title":"Integrated prepass scheduling for a Java just-in-time compiler on the IA-64","authors":"T. Inagaki, H. Komatsu, T. Nakatani","doi":"10.1109/CGO.2003.1191542","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191542","url":null,"abstract":"We present a new integrated prepass scheduling (IPS) algorithm for a Java just-in-time (JIT) compiler which integrates register minimization into list scheduling. We use backtracking in the list scheduling when we have used up all the available registers. To reduce the overhead of backtracking, we incrementally maintain a set of candidate instructions for undoing scheduling. To maximize the ILP after undoing scheduling, we select an instruction chain with the smallest increase in the total execution time. We implemented our new algorithm in a production-level Java JIT compiler for the Intel Itanium processor. The experiment showed that, compared to the best known algorithm by Govindarajan et al., our IPS algorithm improved the performance by up to +1.8% while it reduced the compilation time for IPS by 58% on average.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128234215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
TEST: a Tracer for Extracting Speculative Threads 用于提取推测线程的跟踪程序
International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191554
M. Chen, K. Olukotun
{"title":"TEST: a Tracer for Extracting Speculative Threads","authors":"M. Chen, K. Olukotun","doi":"10.1109/CGO.2003.1191554","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191554","url":null,"abstract":"Thread-level speculation (TLS) allows sequential programs to be arbitrarily decomposed into threads that can be safely executed in parallel. A key challenge for TLS processors is choosing thread decompositions that speedup the program. Current techniques for identifying decompositions have practical limitations in real systems. Traditional parallelizing compilers do not work effectively on most integer programs, and software profiling slows down program execution too much for real-time analysis. Tracer for Extracting Speculative Threads (TEST) is hardware support that analyzes sequential program execution to estimate performance of possible thread decompositions. This hardware is used in a dynamic parallelization system that automatically transforms unmodified, sequential Java programs to run on TLS processors. In this system, the best thread decompositions found by TEST are dynamically recompiled to run speculatively. The paper describes the analysis performed by TEST and presents simulation results demonstrating its effectiveness on real programs. Estimates are also provided that show the tracer requires minimal hardware additions to our speculative chip-multiprocessor (< 1% of the total transistor count) and causes only minor slowdowns to programs during analysis (3-25%).","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125684835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信