Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization最新文献_第2页

Reactive techniques for controlling software speculation 用于控制软件投机的反应技术

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.30

C. Zilles, Naveen Neelakantam

{"title":"Reactive techniques for controlling software speculation","authors":"C. Zilles, Naveen Neelakantam","doi":"10.1109/CGO.2005.30","DOIUrl":"https://doi.org/10.1109/CGO.2005.30","url":null,"abstract":"Aggressive software speculation holds significant potential, because it enables program transformations to reduce the program's critical path. Like any form of speculation, however, the key to software speculation is employing it only where it is likely to succeed. While mechanisms for controlling hardware speculation (e.g., saturating counters updated after each instance) are well understood, these techniques do not translate directly to software techniques because changing a speculation requires changing the code. As it stands, the dominant software speculation control technique, non-reactive profile-guided optimization, lacks the robustness to support aggressive speculation. The primary thesis of this paper is that software speculation can be made to be robust by adding a reactive controller that can dynamically adjust the speculation. We make two primary observations about such systems: 1) reactive control systems can select behaviors on which to speculate with performance that equals or exceeds self-training, and 2) such control systems are remarkably latency tolerant. Although reactivity is required, it can be done at a low frequency; latencies of hundreds of thousands, or even millions of cycles, can be tolerated for most actions. Together these two characteristics imply that robust aggressive software speculation is a realistic goal.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"33 1","pages":"305-316"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81105237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Performance of runtime optimization on BLAST BLAST运行时优化性能

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.25

A. Das, Jiwei Lu, Howard Chen, Jinpyo Kim, P. Yew, W. Hsu, Dong-yuan Chen

引用次数: 11

Effective adaptive computing environment management via dynamic optimization 通过动态优化实现有效的自适应计算环境管理

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.17

Shiwen Hu, M. Valluri, L. John

{"title":"Effective adaptive computing environment management via dynamic optimization","authors":"Shiwen Hu, M. Valluri, L. John","doi":"10.1109/CGO.2005.17","DOIUrl":"https://doi.org/10.1109/CGO.2005.17","url":null,"abstract":"To minimize the surging power consumption of microprocessors, adaptive computing environments (ACEs) where microarchitectural resources can be dynamically tuned to match a program's runtime requirement and characteristics are becoming increasingly common. Adaptive computing environments usually have multiple configurable hardware units, necessitating exploration of a large number of combinatorial configurations in order to identify the most energy-efficient configuration. In this paper, we propose a scheme for efficient management of multiple configurable units, utilizing the inherent capabilities of dynamic optimization systems. Most dynamic optimizers typically detect dominant code regions (hotspots). We develop an ACE management scheme where hotpot boundaries are used for phase detection and adaptation. Since hotspots are of variable sizes and are often nested, program phase behavior which is hierarchical in nature is automatically captured in this technique. To demonstrate the usefulness and effectiveness of our framework, we use the proposed framework to dynamically adapt the sizes of L1 data and L2 caches that have different reconfiguration latencies and overheads. Our technique reduces L1D and L2 cache energy consumption by 47% and 58%, while a popular previously proposed technique only achieves reduction of 32% and 52% respectively.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"288 6 1","pages":"63-73"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85461626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Maintaining consistency and bounding capacity of software code caches 维护软件代码缓存的一致性和边界容量

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.19

Derek Bruening, Saman P. Amarasinghe

{"title":"Maintaining consistency and bounding capacity of software code caches","authors":"Derek Bruening, Saman P. Amarasinghe","doi":"10.1109/CGO.2005.19","DOIUrl":"https://doi.org/10.1109/CGO.2005.19","url":null,"abstract":"Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emulation and meeting or exceeding native performance in dynamic optimizers. One disadvantage of caching, memory expansion, can sometimes be ignored when executing a single application. However, as optimizers and translators are applied more and more in production systems, the memory expansion from running multiple applications simultaneously becomes problematic. A second drawback to caching is the added requirement of maintaining consistency between the code cache and the original code. On architectures like IA-32 that do not require explicit application actions when modifying code, detecting code changes is challenging. Again, consistency can be ignored for certain sets of applications, but as caching systems scale up to executing large, modern, complex programs, consistency becomes critical. This paper presents efficient schemes for keeping a software code cache consistent and for dynamically bounding code cache size to match the current working set of the application. These schemes are evaluated in the DynamoRIO runtime code manipulation system, and operate on stock hardware in the presence of multiple threads and dynamic behavior, including dynamically-loaded, generated, and even modified code.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"447 1","pages":"74-85"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82905330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Automatic generation of high-performance trace compressors 自动生成高性能跟踪压缩机

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.6

Martin Burtscher, Nana B. Sam

引用次数: 26

Efficient SIMD code generation for runtime alignment and length conversion 有效的SIMD代码生成，用于运行时对齐和长度转换

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.18

Peng Wu, A. Eichenberger, Amy Wang

{"title":"Efficient SIMD code generation for runtime alignment and length conversion","authors":"Peng Wu, A. Eichenberger, Amy Wang","doi":"10.1109/CGO.2005.18","DOIUrl":"https://doi.org/10.1109/CGO.2005.18","url":null,"abstract":"When generating codes for today's multimedia extensions, one of the major challenges is to deal with memory alignment issues. While hand programming still yields best performing SIMD codes, it is both time consuming and error prone. Compiler technology has greatly improved, including techniques that simdize loops with misaligned accesses by automatically rearranging misaligned memory streams in registers. Current techniques are applicable to runtime alignments, but they aggressively reduce the alignment overhead only when all alignments are known at compile time. This paper presents two major enhancements to the state of the art, improving both performance and coverage. First, we propose a novel technique to simdize loops with runtime alignment nearly as efficiently as those with compile-time misalignment. Runtime alignment is pervasive in real applications because it is either part of the algorithms, or it is an artifact of the compiler's inability to extract accurate alignment information from complex applications. Second, we incorporate length conversion operations, e.g., conversions between data of different sizes, into the alignment handling framework. Length conversions are pervasive in multimedia applications where mixed integer types are often used. Supporting length conversion can greatly improve the coverage of simdizable loops. Experimental results indicate that our runtime alignment technique achieves a 19% to 32% speedup increase over prior art for a benchmark stressing the impact of misaligned data. We also demonstrate speedup factors of up to 8.11 for real benchmarks over sequential execution.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"54 1","pages":"153-164"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78657292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy 结合模型和引导经验搜索，对多级记忆结构进行优化

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.10

Chun Chen, Jacqueline Chame, Mary W. Hall

引用次数: 138

Predicting unroll factors using supervised classification 使用监督分类预测展开因子

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.29

M. Stephenson, Saman P. Amarasinghe

{"title":"Predicting unroll factors using supervised classification","authors":"M. Stephenson, Saman P. Amarasinghe","doi":"10.1109/CGO.2005.29","DOIUrl":"https://doi.org/10.1109/CGO.2005.29","url":null,"abstract":"Compilers base many critical decisions on abstracted architectural models. While recent research has shown that modeling is effective for some compiler problems, building accurate models requires a great deal of human time and effort. This paper describes how machine learning techniques can be leveraged to help compiler writers model complex systems. Because learning techniques can effectively make sense of high dimensional spaces, they can be a valuable tool for clarifying and discerning complex decision boundaries. In this work we focus on loop unrolling, a well-known optimization for exposing instruction level parallelism. Using the Open Research Compiler as a testbed, we demonstrate how one can use supervised learning techniques to determine the appropriateness of loop unrolling. We use more than 2,500 loops - drawn from 72 benchmarks - to train two different learning algorithms to predict unroll factors (i.e., the amount by which to unroll a loop) for any novel loop. The technique correctly predicts the unroll factor for 65% of the loops in our dataset, which leads to a 5% overall improvement for the SPEC 2000 benchmark suite (9% for the SPEC 2000 floating point benchmarks).","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"64 1","pages":"123-134"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82362258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 201

Superword-level parallelism in the presence of control flow 控制流存在时的超字级并行性

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.33

Jaewook Shin, Mary W. Hall, Jacqueline Chame

{"title":"Superword-level parallelism in the presence of control flow","authors":"Jaewook Shin, Mary W. Hall, Jacqueline Chame","doi":"10.1109/CGO.2005.33","DOIUrl":"https://doi.org/10.1109/CGO.2005.33","url":null,"abstract":"In this paper, we describe how to extend the concept of superword-level parallelization (SLP), used for multimedia extension architectures, so that it can be applied in the presence of control flow constructs. Superword-level parallelization involves identifying scalar instructions in a large basic block that perform the same operation, and, if dependences do not prevent it, combining them into a superword operation on a multi-word object. A key insight is that we can use techniques related to optimizations for architectures supporting predicated execution, even for multimedia ISAs that do not provide hardware predication. We derive large basic blocks with predicated instructions to which SLP can be applied. We describe how to minimize overheads for superword predicates and re-introduce control flow for scalar operations. We discuss other extensions to SLP to address common features of real multimedia codes. We present automatically-generated performance results on 8 multimedia codes to demonstrate the power of this approach. We observe speedups ranging from 1.97X to 15.07X as compared to both sequential execution and SLP alone.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"2 1","pages":"165-175"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86076481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 122

A progressive register allocator for irregular architectures 一种用于不规则体系结构的渐进寄存器分配器

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI: 10.1109/CGO.2005.4

D. Koes, S. Goldstein

引用次数: 28