International Symposium on Code Generation and Optimization, 2003. CGO 2003.最新文献_第2页

METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting 度量:通过二进制重写跟踪内存层次结构中的低效率

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191553

Jaydeep Marathe, F. Mueller, Tushar Mohan, B. Supinski, S. Mckee, A. Yoo

引用次数: 40

Retargetable and reconfigurable software dynamic translation 可重新定位和可重新配置的软件动态翻译

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191531

K. Scott, Naveen Kumar, S. Velusamy, B. Childers, J. Davidson, M. Soffa

引用次数: 227

Jumbo: run-time code generation for Java and its applications Jumbo: Java及其应用程序的运行时代码生成

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191532

Samuel N. Kamin, L. Clausen, Ava Jarvis

引用次数: 40

Dynamic profiling and trace cache generation 动态分析和跟踪缓存生成

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191552

Marc Berndl, L. Hendren

{"title":"Dynamic profiling and trace cache generation","authors":"Marc Berndl, L. Hendren","doi":"10.1109/CGO.2003.1191552","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191552","url":null,"abstract":"Dynamic program optimization is increasingly important for achieving good runtime performance. A key issue is how to select which code to optimize. One approach is to dynamically detect traces, long sequences of instructions spanning multiple methods, which are likely to execute to completion. Traces are easy to optimize and have been shown to be a good unit for optimization. The paper reports on a new approach for dynamically detecting, creating and storing traces in a Java virtual machine. We first describe four important criteria for a successful trace strategy: good instruction stream coverage, low dispatch rate, cache stability, and optimizability of traces. We then present our approach based on branch correlation graphs. A branch correlation graph stores information about the correlation between pairs of branches, as well as additional state information. We present the complete design for an efficient implementation of the system, including a detailed discussion of the trace cache and profiling mechanisms. We have implemented an experimental framework to measure the traces generated by our approach in a direct-threaded Java VM (SableVM) and we present experimental results to show that the traces we generate meet the design criteria.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133615294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Reality-based optimization 以现实为基础的优化

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191533

S. McFarling

引用次数: 13

Hiding program slices for software security 隐藏程序片段以保证软件安全

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191556

X. Zhang, Rajiv Gupta

{"title":"Hiding program slices for software security","authors":"X. Zhang, Rajiv Gupta","doi":"10.1109/CGO.2003.1191556","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191556","url":null,"abstract":"Given the high cost of producing software, development of technology for prevention of software piracy is important for the software industry. In this paper we present a novel approach for preventing the creation of unauthorized copies of software. Our approach splits software modules into open and hidden components. The open components are installed (executed) on an insecure machine while the hidden components are installed (executed) on a secure machine. We assume that while open components can be stolen, to obtain a fully functioning copy of the software, the hidden components must be recovered. We describe an algorithm that constructs hidden components by slicing the original software components. We argue that recovery of hidden components constructed through slicing, in order to obtain a fully functioning copy of the software, is a complex task. We further develop security analysis to capture the complexity of recovering hidden components. Finally we apply our technique to several large Java programs to study the complexity of recovering constructed hidden components and to measure the runtime overhead introduced by splitting of software into open and hidden components.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124813519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Code optimization for code compression 代码压缩的代码优化

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191555

M. Drinic, D. Kirovski, Hoi Vo

{"title":"Code optimization for code compression","authors":"M. Drinic, D. Kirovski, Hoi Vo","doi":"10.1109/CGO.2003.1191555","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191555","url":null,"abstract":"With the emergence of software delivery platforms such as Microsoft's .NET, the reduced size of transmitted binaries has become a very important system parameter, strongly affecting system performance. We present two novel pre-processing steps for code compression that explore program binaries' syntax and semantics to achieve superior compression ratios. The first preprocessing step involves heuristic partitioning of a program binary into streams with high auto-correlation. The second preprocessing step uses code optimization via instruction rescheduling in order to improve prediction probabilities for a given compression engine. We have developed three heuristics for instruction rescheduling that explore tradeoffs of the solution quality versus algorithm run-time. The pre-processing steps are integrated with the generic paradigm of prediction by partial matching (PPM) which is the basis of our compression codec. The compression algorithm is implemented for x86 binaries and tested on several large Microsoft applications. Binaries compressed using our compression codec are 18-24% smaller than those compressed using the best available off-the-shelf compressor.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125746695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache 具有分布式数据缓存的集群VLIW处理器中内存一致性的本地调度技术

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191545

E. Gibert, F. Sánchez, Antonio González

{"title":"Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache","authors":"E. Gibert, F. Sánchez, Antonio González","doi":"10.1109/CGO.2003.1191545","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191545","url":null,"abstract":"Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where the register file, the functional units and the cache memory are partitioned, are particularly effective to deal with these constraints and besides they are very scalable. However the distribution of the data cache introduces a new problem: memory instructions may reach the cache in an order different to the sequential program order, thus possibly violating its contents. In this paper two local scheduling mechanisms that guarantee the serialization of aliased memory instructions are proposed and evaluated: the construction of memory dependent chains (MDC solution), and two transformations (store replication and load-store synchronization) applied to the original data dependence graph (DDGT solution). These solutions do not require any extra hardware. The proposed scheduling techniques are evaluated for a word-interleaved cache clustered VLIW processor (although these techniques can also be used for any other distributed cache configuration). Results for the Mediabench benchmark suite demonstrate the effectiveness of such techniques. In particular, the DDGT solution increases the proportion of local accesses by 16% compared to MDC, and stall time is reduced by 32% since load instructions can be freely scheduled in any cluster However the MDC solution reduces compute time and it often outperforms the former. Finally the impact of both techniques on an architecture with attraction buffers is studied and evaluated.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Optimal and efficient speculation-based partial redundancy elimination 最优和有效的基于推测的部分冗余消除

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191536

Qiong Cai, Jingling Xue

{"title":"Optimal and efficient speculation-based partial redundancy elimination","authors":"Qiong Cai, Jingling Xue","doi":"10.1109/CGO.2003.1191536","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191536","url":null,"abstract":"Existing profile-guided partial redundancy elimination (PRE) methods use speculation to enable the removal of partial redundancies along more frequently executed paths at the expense of introducing additional expression evaluations along less frequently executed paths. While being capable of minimizing the number of expression evaluations in some cases, they are, in general, not computationally optimal in achieving this objective. In addition, the experimental results for their effectiveness are mostly missing. This work addresses the following three problems: (1) Is the computational optimality of speculative PRE solvable in polynomial time? (2) Is edge profiling - less costly than path profiling - sufficient to guarantee the computational optimality? (3) Is the optimal algorithm (if one exists) lightweight enough to be used efficiently in a dynamic compiler? In this paper, we provide positive answers to the first two problems and promising results to the third. We present an algorithm that analyzes edge insertion points based on an edge profile. Our algorithm guarantees optimally that the total number of computations for an expression in the transformed code is always minimized with respect to the edge profile given. This implies that edge profiling, which is less costly than path profiling, is sufficient to guarantee this optimality. The key in the development of our algorithm lies in the removal of some non-essential edges (and consequently, all resulting non-essential nodes) from a flow graph so that the problem of finding an optimal code motion is reduced to one of finding a minimal cut in the reduced (flow) graph thus obtained. We have implemented our algorithm in Intel's Open Runtime Platform (ORP). Our preliminary results over a number of Java benchmarks show that our algorithm is lightweight and can be potentially a practical component in a dynamic compiler. As a result, our algorithm can also be profitably employed in a profile-guided static compiler in which compilation cost can often be sacrificed for code efficiency.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131111422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Optimizing memory accesses for spatial computation 优化空间计算的内存访问

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI: 10.1109/CGO.2003.1191547

M. Budiu, S. Goldstein

{"title":"Optimizing memory accesses for spatial computation","authors":"M. Budiu, S. Goldstein","doi":"10.1109/CGO.2003.1191547","DOIUrl":"https://doi.org/10.1109/CGO.2003.1191547","url":null,"abstract":"We present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow- and dependence information. In CASH, memory optimization is a four-step process: (1) first an initial, relatively coarse representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4) finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH. The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion. One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123051628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29