Software and Compilers for Embedded Systems最新文献_第3页

Decoupled graph-coloring register allocation with hierarchical aliasing 分层混叠的解耦图着色寄存器分配

Software and Compilers for Embedded Systems Pub Date : 2011-06-27 DOI: 10.1145/1988932.1988934

A. Tavares, Quentin Colombet, Mariza Bigonha, C. Guillon, Fernando Magno Quintão Pereira, F. Rastello

{"title":"Decoupled graph-coloring register allocation with hierarchical aliasing","authors":"A. Tavares, Quentin Colombet, Mariza Bigonha, C. Guillon, Fernando Magno Quintão Pereira, F. Rastello","doi":"10.1145/1988932.1988934","DOIUrl":"https://doi.org/10.1145/1988932.1988934","url":null,"abstract":"Recent results have shown how to do graph-coloring-based register allocation in a way that decouples spilling from register assignment. This decoupled approach has the main advantage of simplifying the implementation of register allocators. However, the decoupled model, as described in previous works, faces many problems when dealing with register aliasing, a phenomenon typical in architectures usually seen in embedded systems, such as ARM. In this paper we introduce the semi-elementary form, a program representation that brings decoupled register allocation to architectures with register aliasing. The semi-elementary form is much smaller than program representations used by previous decoupled solutions; thus, leading to register allocators that perform better in terms of time and space. Furthermore, this representation reduces the number of copies that traditional allocators insert into assembly programs. We have empirically validated our results by showing how our representation improves two well known graph coloring based allocators, namely the Iterated Register Coalescer (IRC), and Bouchez et al.'s brute force (BF) method, both augmented with Smith et al. extensions to handle aliasing. Running our techniques on SPEC CPU 2000, we have reduced the number of nodes in the interference graphs by a factor of 4 to 5; hence, speeding-up allocation time by a factor of 3 to 5. Additionally the semi-elementary form reduces by 8% the number of copies that IRC leaves uncoalesced.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126204207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SMT-based optimization for synchronous programs 基于smt的同步程序优化

Software and Compilers for Embedded Systems Pub Date : 2011-06-27 DOI: 10.1145/1988932.1988935

Yu Bai, J. Brandt, K. Schneider

引用次数: 4

Enhanced structural analysis for C code reconstruction from IR code 从红外代码重构C代码的增强结构分析

Software and Compilers for Embedded Systems Pub Date : 2011-06-27 DOI: 10.1145/1988932.1988936

Felix Engel, R. Leupers, G. Ascheid, Max Ferger, M. Beemster

引用次数: 11

B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems 嵌入式系统中指令TLB功耗降低的基于边界的过程放置

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811215

Reiley Jeyapaul, Aviral Shrivastava

{"title":"B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems","authors":"Reiley Jeyapaul, Aviral Shrivastava","doi":"10.1145/1811212.1811215","DOIUrl":"https://doi.org/10.1145/1811212.1811215","url":null,"abstract":"High performance embedded processors are equipped with the Translation Look-aside Buffer (TLB) which forms the key ingredient to efficient and speedy virtual memory management. The TLB though small, is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Among the many circuit and microarchitectural techniques proposed to reduce TLB power consumption, the Use-Last TLB is one very efficient technique in which power is consumed only when different pages are accessed in succession, i.e., when there is a page-switch [26]. Though the Use-Last technique is effective in reducing i-TLB power, there is scope to further improve its effectiveness by changing the relative code placement of the program. In this work, we formulate the code placement problem to minimize the page-switches in a program. We prove that this problem is NP-complete and propose an efficient Bounds Based Procedure Placement (B2P2) heuristic to efficiently reduce the program's page-switches. Our procedure placement technique delivers an average of 76% reduction in the instrucion-TLB power with negligible (< 2%) impact on performance, over and above the reduction achieved by the Use-Last TLB architecture alone.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130234007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Supporting islands of coherency for highly-parallel embedded architectures using compile-time virtualisation 支持使用编译时虚拟化的高度并行嵌入式架构的一致性孤岛

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811223

Ian Gray, N. Audsley

{"title":"Supporting islands of coherency for highly-parallel embedded architectures using compile-time virtualisation","authors":"Ian Gray, N. Audsley","doi":"10.1145/1811212.1811223","DOIUrl":"https://doi.org/10.1145/1811212.1811223","url":null,"abstract":"As their complexity grows, the architectures of embedded systems are becoming increasingly parallel. However, the frameworks used to assist development on highly-parallel general-purpose systems (such as CORBA or MPI) are too heavyweight for use on the non-standard architectures of embedded systems. They introduce significant overheads due to the lack of architectural and structural information contained within most programming languages. Specifically, thread migration across irregular architectures can lead to very poor memory access times, and unconstrained cache coherency cannot scale to cope with large systems.\u0000 This paper introduces an approach to solving these problems in a scalable way with minimal run-time overhead by using the concept of 'Islands of Coherency'. Cooperating threads are grouped into clusters along with the data that they use. These clusters can then be efficiently mapped to the target architecture, utilising migration only in the areas where the programmer explicitly declares it.\u0000 This is supported through the use of an existing technique called Compile-Time Virtualisation (CTV). CTV does not support run-time dynamism, so it is extended to allow the implementation of Islands of Coherency. The presented system is evaluated experimentally through implementation on an FPGA platform. Simulation-based results are also presented that show the potential that this approach has for increasing the performance of future embedded systems.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"503 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125632159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A higher-order extension for imperative synchronous languages 命令式同步语言的高阶扩展

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811222

Eric Vecchié, J. Talpin, Sébastien Boisgérault

{"title":"A higher-order extension for imperative synchronous languages","authors":"Eric Vecchié, J. Talpin, Sébastien Boisgérault","doi":"10.1145/1811212.1811222","DOIUrl":"https://doi.org/10.1145/1811212.1811222","url":null,"abstract":"This article presents the very first effective design of higher-order modules in the synchronous programming language Esterel. Higher-order modules, together with the robust separate compilation scheme that implements it, allow us to address a yet unexplored application spectrum ranging from rapid prototyping of embedded functionality to hot reconfiguration of embedded software within the formal modeling framework of the \"synchronous hypothesis\". While extensions of data-flow synchronous languages had already been proposed for Lustre [11] and Signal [25], the adaptation of similar programming concepts to imperative synchronous frameworks like Esterel has long posed major technical challenges, due to the specificity of its model of computation. We present a framework including a formal semantics, a type system, and a modular code generator, that tackle this challenge. We consider a specific stack-based module call convention and a simple event pooling protocol; in consequence signals can refer to modules and modules can be transmitted and instantiated by referencing a signal. We define a type system that computes the potential emissions of a module and prove it sound. Our type system seamlessly fits an extension of Esterel's constructive semantics with higher-order modules.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127174994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

System level MPSoC design: a bright future for compiler technology? 系统级MPSoC设计:编译器技术的光明未来?

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811225

R. Leupers

引用次数: 0

Parallel copy motion 平行复制运动

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811214

Florent Bouchez, Quentin Colombet, A. Darte, F. Rastello, C. Guillon

{"title":"Parallel copy motion","authors":"Florent Bouchez, Quentin Colombet, A. Darte, F. Rastello, C. Guillon","doi":"10.1145/1811212.1811214","DOIUrl":"https://doi.org/10.1145/1811212.1811214","url":null,"abstract":"Recent results on the static single assignment (SSA) form open promising directions for the design of register allocation heuristics for just-in-time (JIT) compilation. In particular, tree-scan allocators with two decoupled phases, one for spilling and one for splitting/coloring/coalescing, seem good candidates for designing fast, memory-friendly, and competitive register allocators. Linear-scan allocators, introduced earlier, are also well-suited for JIT compilation. All do live-range splitting (mostly on control-flow edges) to avoid spilling but most of them perform coalescing poorly, leading to many register-to-register copies inside basic blocks, but also, implicitly, on the control-flow graph edges, leading to edge splitting.\u0000 This paper presents parallel copy motion, a technique for optimizing register-allocated codes, which amounts to moving a group of parallel copy instructions from a program point to another. While the scheduling is shackled by data dependencies, a copy can \"traverse\" all instructions of a basic block, thanks to register renaming, except those with conflicting naming constraints. Also, with an adequate management of compensation code, parallel copies can also be moved across edges. A first application is reducing the cost of copies by a better placement. A second application is moving copies out of critical edges, i.e., edges going from a block with multiple successors to a block with multiple predecessors. This is often beneficial compared to the alternative: splitting the edge. A direct use case is the handling of control-flow graphs with non-splittable edges, introduced by some compilers for specific architectural constraints, region boundaries, or exception handling code.\u0000 Experiments with the SPECint and our own benchmarks suite show that an SSA-based register allocator can be applied broadly now, even for procedures with non-splittable edges: while those procedures could not be compiled before, with parallel copy motion, all moves could be pushed out of such edges. Even simple strategies for moving copies out of edges and inside basic blocks show some average improvement compared to the standard edge-splitting strategy (3% speedup), with a great reduction of the weighted number of copies (21% move cost reduction for SPECint). This lets us believe that the approach is promising, and not only for improving coalescing in fast register allocators.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133366080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Interval analysis of microcontroller code using abstract interpretation of hardware and software 区间分析单片机代码，采用抽象的硬件和软件解释

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811216

Jörg Brauer, T. Noll, Bastian Schlich

引用次数: 20

Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining 支持使用决策树进行数据挖掘的特定领域编译器优化开发的工作负载特性

Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811219

Damon Fenacci, Björn Franke, John Thomson

{"title":"Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining","authors":"Damon Fenacci, Björn Franke, John Thomson","doi":"10.1145/1811212.1811219","DOIUrl":"https://doi.org/10.1145/1811212.1811219","url":null,"abstract":"Embedded systems have successfully entered a broad variety of application domains such as automotive and industrial control, telecommunications, networking, digital media, consumer equipment, office automation and many more. In this paper we investigate if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compilers. We develop an automated approach that is capable of identifying domain-specific workload characterizations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarize key resource utilization issues and enable compiler engineers to address the highlighted bottlenecks. We have evaluated our methodology against the industrial EEMBC benchmark suite and three popular embedded processors and have found that workload profiles differ significantly between application domains. We demonstrate that these characteristics can be exploited for the development of domain-specific compiler optimizations. In a case study we show average performance improvements of up to 44% for a class of networking applications.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12