Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization最新文献

筛选
英文 中文
Towards Program Optimization through Automated Analysis of Numerical Precision. 通过数值精度自动分析实现程序优化。
Michael D Linderman, Matthew Ho, David L Dill, Teresa H Meng, Garry P Nolan
{"title":"Towards Program Optimization through Automated Analysis of Numerical Precision.","authors":"Michael D Linderman,&nbsp;Matthew Ho,&nbsp;David L Dill,&nbsp;Teresa H Meng,&nbsp;Garry P Nolan","doi":"10.1145/1772954.1772987","DOIUrl":"https://doi.org/10.1145/1772954.1772987","url":null,"abstract":"<p><p>Reducing the arithmetic precision of a computation has real performance implications, including increased speed, decreased power consumption, and a smaller memory footprint. For some architectures, e.g., GPUs, there can be such a large performance difference that using reduced precision is effectively a requirement. The tradeoff is that the accuracy of the computation will be compromised. In this paper we describe a proof assistant and associated static analysis techniques for efficiently bounding numerical and precision-related errors. The programmer/compiler can use these bounds to numerically verify and optimize an application for different input and machine configurations. We present several case study applications that demonstrate the effectiveness of these techniques and the performance benefits that can be achieved with rigorous precision analysis.</p>","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"2010 ","pages":"230-237"},"PeriodicalIF":0.0,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1772954.1772987","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35320038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Phase-aware remote profiling 相位感知远程分析
P. Nagpurkar, C. Krintz, T. Sherwood
{"title":"Phase-aware remote profiling","authors":"P. Nagpurkar, C. Krintz, T. Sherwood","doi":"10.1109/CGO.2005.26","DOIUrl":"https://doi.org/10.1109/CGO.2005.26","url":null,"abstract":"Recent advances in networking and embedded device technology have made the vision of ubiquitous computing a reality; users can access the Internet's vast offerings anytime and anywhere. Moreover, battery-powered devices such as personal digital assistants and Web-enabled mobile phones have successfully emerged as new access points to the world's digital, infrastructure. This ubiquity offers a new opportunity for software developers: users can now participate in the software development, optimization, and evolution process while they use their software. Such participation requires effective techniques for gathering profile information from remote, resource-constrained devices. Further, these techniques must be unobtrusive and transparent to the user; profiles must be gathered using minimal computation, communication, and power. Toward this end, we present a flexible hardware-software scheme for efficient remote profiling. We rely on the extraction of meta information from executing programs in the form of phases, and then use this information to guide intelligent online sampling and to manage the communication of those samples. Our results indicate that phase-based remote profiling can reduce the communication, computation, and energy consumption overheads by 50-75% over random and periodic sampling.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"11 15 1","pages":"191-202"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72646021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Compiler managed dynamic instruction placement in a low-power code cache 编译器在低功耗代码缓存中管理动态指令放置
Rajiv A. Ravindran, Pracheeti D. Nagarkar, Ganesh S. Dasika, E. Marsman, R. Senger, S. Mahlke, Richard B. Brown
{"title":"Compiler managed dynamic instruction placement in a low-power code cache","authors":"Rajiv A. Ravindran, Pracheeti D. Nagarkar, Ganesh S. Dasika, E. Marsman, R. Senger, S. Mahlke, Richard B. Brown","doi":"10.1109/CGO.2005.13","DOIUrl":"https://doi.org/10.1109/CGO.2005.13","url":null,"abstract":"Modern embedded microprocessors use low power on-chip memories called scratch-pad memories to store frequently executed instructions and data. Unlike traditional caches, scratch-pad memories lack the complex tag checking and comparison logic, thereby proving to be efficient in area and power. In this work, we focus on exploiting scratch-pad memories for storing hot code segments within an application. Static placement techniques focus on placing the most frequently executed portions of programs into the scratch-pad. However, static schemes are inherently limited by not allowing the contents of the scratch-pad memory to change at run time. In a large fraction of applications, the instruction memory footprints exceed the scratch-pad memory size, thereby limiting the usefulness of the scratch-pad. We propose a compiler managed dynamic placement algorithm, wherein multiple hot code sequences, or traces, are overlapped with each other in the scratch-pad memory at different points in time during execution. Special copy instructions are provided to copy the traces into the scratch-pad memory at run-time. Using a power estimate, the compiler initially selects the most frequent traces in an application for relocation into the scratch-pad memory. Through iterative code motion and redundancy elimination, copy instructions are inserted in infrequently executed regions of the code. For a 64-byte code cache, the compiler managed dynamic placement achieves an average of 64% energy improvement over the static solution in a low-power embedded microcontroller.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"3 1","pages":"179-190"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86368795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Practical path profiling for dynamic optimizers 动态优化器的实用路径剖析
Michael D. Bond, K. McKinley
{"title":"Practical path profiling for dynamic optimizers","authors":"Michael D. Bond, K. McKinley","doi":"10.1109/CGO.2005.28","DOIUrl":"https://doi.org/10.1109/CGO.2005.28","url":null,"abstract":"Modern processors are hungry for instructions. To satisfy them, compilers need to find and optimize execution paths across multiple basic blocks. Path profiles provide this context, but their high overhead has so far limited their use by dynamic compilers. We present new techniques for low overhead online practical path profiling (PPP). Following targeted path profiling (TPP), PPP uses an edge profile to simplify path profile instrumentation (profile-guided profiling). PPP improves over prior work by (1) reducing the amount of profiling instrumentation on cold paths and paths that the edge profile predicts well and (2) reducing the cost of the remaining instrumentation. Experiments in an ahead-of-time compiler perform edge profile-guided inlining and unrolling prior to path profiling instrumentation. These transformations are faithful to staged optimization, and create longer, harder to predict paths. We introduce the branch-flow metric to measure path flow as a function of branch decisions, rather than weighting all paths equally as in prior work. On SPEC2000, PPP maintains high accuracy and coverage, but has only 5% overhead on average (ranging from -3% to 13%), making it appealing for use by dynamic compilers.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"55 1","pages":"205-216"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73480913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
A programmable hardware path profiler 一个可编程的硬件路径分析器
K. Vaswani, M. J. Thazhuthaveetil, Y. Srikant
{"title":"A programmable hardware path profiler","authors":"K. Vaswani, M. J. Thazhuthaveetil, Y. Srikant","doi":"10.1109/CGO.2005.3","DOIUrl":"https://doi.org/10.1109/CGO.2005.3","url":null,"abstract":"For aggressive path-based program optimizations to be profitable in cost-sensitive environments, accurate path profiles must be available at low overheads. In this paper, we propose a low-overhead, non-intrusive hardware path profiling scheme that can be programmed to detect several types of paths including acyclic, intra-procedural paths, paths for a whole program path and extended paths. The profiler consists of a path stack, which detects paths and generates a sequence of path descriptors using branch information from the processor pipeline, and a hot path table that collects a profile of hot paths for later use by a program optimizer. With assistance from the processor's event detection logic, our profiler can track a host of architectural metrics along paths, enabling context-sensitive performance monitoring and bottleneck analysis. We illustrate the utility of our scheme by associating paths with a power metric that estimates power consumption in the cache hierarchy caused by instructions along the path. Experiments using programs from the SPEC CPU2000 benchmark suite show that our path profiler, occupying 7KB of hardware real-estate, collects accurate path profiles (average overlap of 88% with a perfect profile) at negligible execution time overheads (0.6% on average).","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"29 1","pages":"217-228"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79148829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A model-based framework: an approach for profit-driven optimization 基于模型的框架:利润驱动的优化方法
Min Zhao, B. Childers, M. Soffa
{"title":"A model-based framework: an approach for profit-driven optimization","authors":"Min Zhao, B. Childers, M. Soffa","doi":"10.1109/CGO.2005.2","DOIUrl":"https://doi.org/10.1109/CGO.2005.2","url":null,"abstract":"Although optimizations have been applied for a number of years to improve the performance of software, problems that have been long-standing remain, which include knowing what optimizations to apply and how to apply them. To systematically tackle these problems, we need to understand the properties of optimizations. In our current research, we are investigating the profitability property, which is useful for determining the benefit of applying an optimization. Due to the high cost of applying optimizations and then experimentally evaluating their profitability, we use an analytic model framework for predicting the profitability of optimizations. In this paper, we target scalar optimizations, and in particular, describe framework instances for partial redundancy elimination (PRE) and loop invariant code motion (LICM). We implemented the framework for both optimizations and compare profit-driven PRE and LICM with a heuristic-driven approach. Our experiments demonstrate that a model-based approach is effective and efficient in that it can accurately predict the profitability of optimizations with low overhead. By predicting the profitability using models, we can selectively apply optimizations. The model-based approach does not require tuning of parameters used in heuristic approaches and works well across different code contexts and optimizations.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"71 1","pages":"317-327"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80957035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
A general compiler framework for speculative optimizations using data speculative code motion 使用数据推测代码运动进行推测优化的通用编译器框架
Xiaoru Dai, Antonia Zhai, W. Hsu, P. Yew
{"title":"A general compiler framework for speculative optimizations using data speculative code motion","authors":"Xiaoru Dai, Antonia Zhai, W. Hsu, P. Yew","doi":"10.1109/CGO.2005.1","DOIUrl":"https://doi.org/10.1109/CGO.2005.1","url":null,"abstract":"Data speculative optimization refers to code transformations that allow load and store instructions to be moved across potentially dependent memory operations. Existing research work on data speculative optimizations has mainly focused on individual code transformation. The required speculative analysis that identifies data speculative optimization opportunities and the required recovery code generation that guarantees the correctness of their execution are handled separately for each optimization. This paper proposes a new compiler framework to facilitate the design and implementation of general data speculative optimizations such as dead store elimination, redundancy elimination, copy propagation, and code scheduling. This framework allows different data speculative optimizations to share the followings: (i) a speculative analysis mechanism to identify data speculative optimization opportunities by ignoring low probability data dependences from optimizations, and (ii) a recovery code generation mechanism to guarantee the correctness of the data speculative optimizations. The proposed recovery code generation is based on data speculative code motion (DSCM) that uses code motion to facilitate a desired transformation. Based on the position of the moved instruction, recovery code can be generated accordingly. The proposed framework greatly simplifies the task of incorporating data speculation into non-speculative optimizations by sharing the recovery code generation and the speculative analysis. We have implemented the proposed framework in the ORC 2.1 compiler and demonstrated its effectiveness on SPEC2000 benchmark programs.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"25 1","pages":"280-290"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87292548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Context threading: a flexible and efficient dispatch technique for virtual machine interpreters 上下文线程:一种灵活高效的虚拟机解释器调度技术
Marc Berndl, B. Vitale, M. Zaleski, Angela Demke Brown
{"title":"Context threading: a flexible and efficient dispatch technique for virtual machine interpreters","authors":"Marc Berndl, B. Vitale, M. Zaleski, Angela Demke Brown","doi":"10.1109/CGO.2005.14","DOIUrl":"https://doi.org/10.1109/CGO.2005.14","url":null,"abstract":"Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"72 1","pages":"15-26"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80125125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Collecting and exploiting high-accuracy call graph profiles in virtual machines 收集和利用虚拟机中的高精度调用图配置文件
Matthew Arnold, D. Grove
{"title":"Collecting and exploiting high-accuracy call graph profiles in virtual machines","authors":"Matthew Arnold, D. Grove","doi":"10.1109/CGO.2005.9","DOIUrl":"https://doi.org/10.1109/CGO.2005.9","url":null,"abstract":"Due to the high dynamic frequency of virtual method calls in typical object-oriented programs, feedback-directed devirtualization and inlining is one of the most important optimizations performed by high-performance virtual machines. A critical input to effective feedback-directed inlining is an accurate dynamic call graph. In a virtual machine, the dynamic call graph is computed online during program execution. Therefore, to maximize overall system performance, the profiling mechanism must strike a balance between profile accuracy, the speed at which the profile becomes available to the optimizer, and profiling overhead. This paper introduces a new low-overhead sampling-based technique that rapidly converges on a high-accuracy dynamic call graph. We have implemented the technique in two high-performance virtual machines: Jikes RVM and J9. We empirically assess our profiling technique by reporting on the accuracy of the dynamic call graphs it computes and by demonstrating that increasing the accuracy of the dynamic call graph results in more effective feedback-directed inlining.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"52 1","pages":"51-62"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81716608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Practical and accurate low-level pointer analysis 实用和准确的低级指针分析
B. Guo, Matthew J. Bridges, Spyridon Triantafyllis, Guilherme Ottoni, Easwaran Raman, David I. August
{"title":"Practical and accurate low-level pointer analysis","authors":"B. Guo, Matthew J. Bridges, Spyridon Triantafyllis, Guilherme Ottoni, Easwaran Raman, David I. August","doi":"10.1109/CGO.2005.27","DOIUrl":"https://doi.org/10.1109/CGO.2005.27","url":null,"abstract":"Pointer analysis is traditionally performed once, early in the compilation process, upon an intermediate representation (IR) with source-code semantics. However, performing pointer analysis only once at this level imposes a phase-ordering constraint, causing alias information to become stale after subsequent code transformations. Moreover, high-level pointer analysis cannot be used at link time or run time, where the source code is unavailable. This paper advocates performing pointer analysis on a low-level intermediate representation. We present the first context-sensitive and partially flow-sensitive points-to analysis designed to operate at the assembly level. As we will demonstrate, low-level pointer analysis can be as accurate as high-level analysis. Additionally, our low-level pointer analysis also enables a quantitative comparison of propagating high-level pointer analysis results through subsequent code transformations, versus recomputing them at the low level. We show that, for C programs, the former practice is considerably less accurate than the latter.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"18 1","pages":"291-302"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80939300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信