Software and Compilers for Embedded Systems最新文献

筛选
英文 中文
Decoupled graph-coloring register allocation with hierarchical aliasing 分层混叠的解耦图着色寄存器分配
Software and Compilers for Embedded Systems Pub Date : 2011-06-27 DOI: 10.1145/1988932.1988934
A. Tavares, Quentin Colombet, Mariza Bigonha, C. Guillon, Fernando Magno Quintão Pereira, F. Rastello
{"title":"Decoupled graph-coloring register allocation with hierarchical aliasing","authors":"A. Tavares, Quentin Colombet, Mariza Bigonha, C. Guillon, Fernando Magno Quintão Pereira, F. Rastello","doi":"10.1145/1988932.1988934","DOIUrl":"https://doi.org/10.1145/1988932.1988934","url":null,"abstract":"Recent results have shown how to do graph-coloring-based register allocation in a way that decouples spilling from register assignment. This decoupled approach has the main advantage of simplifying the implementation of register allocators. However, the decoupled model, as described in previous works, faces many problems when dealing with register aliasing, a phenomenon typical in architectures usually seen in embedded systems, such as ARM. In this paper we introduce the semi-elementary form, a program representation that brings decoupled register allocation to architectures with register aliasing. The semi-elementary form is much smaller than program representations used by previous decoupled solutions; thus, leading to register allocators that perform better in terms of time and space. Furthermore, this representation reduces the number of copies that traditional allocators insert into assembly programs. We have empirically validated our results by showing how our representation improves two well known graph coloring based allocators, namely the Iterated Register Coalescer (IRC), and Bouchez et al.'s brute force (BF) method, both augmented with Smith et al. extensions to handle aliasing. Running our techniques on SPEC CPU 2000, we have reduced the number of nodes in the interference graphs by a factor of 4 to 5; hence, speeding-up allocation time by a factor of 3 to 5. Additionally the semi-elementary form reduces by 8% the number of copies that IRC leaves uncoalesced.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126204207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SMT-based optimization for synchronous programs 基于smt的同步程序优化
Software and Compilers for Embedded Systems Pub Date : 2011-06-27 DOI: 10.1145/1988932.1988935
Yu Bai, J. Brandt, K. Schneider
{"title":"SMT-based optimization for synchronous programs","authors":"Yu Bai, J. Brandt, K. Schneider","doi":"10.1145/1988932.1988935","DOIUrl":"https://doi.org/10.1145/1988932.1988935","url":null,"abstract":"In this paper, we present several optimization techniques to improve the runtime and size of the code generated from synchronous programs. These optimizations work on extended finite state machines (EFSMs) that can be used as intermediate representation for any synchronous system. Our optimizations consists of two phases: First, local optimization guides the EFSM generation and considers the states and edges separately. Second, global optimization is based on a dataflow analysis of the entire EFSM. For both phases, we employ an SMT (Satisfiability Modulo Theories) solver to verify the individual optimization steps. Our experiments show the potential of the presented optimizations: optimized programs generally have a smaller size and a better run-time performance.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115356151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhanced structural analysis for C code reconstruction from IR code 从红外代码重构C代码的增强结构分析
Software and Compilers for Embedded Systems Pub Date : 2011-06-27 DOI: 10.1145/1988932.1988936
Felix Engel, R. Leupers, G. Ascheid, Max Ferger, M. Beemster
{"title":"Enhanced structural analysis for C code reconstruction from IR code","authors":"Felix Engel, R. Leupers, G. Ascheid, Max Ferger, M. Beemster","doi":"10.1145/1988932.1988936","DOIUrl":"https://doi.org/10.1145/1988932.1988936","url":null,"abstract":"Modern compilers parse their input, which usually is a high-level programming language, and then convert the resulting parse tree into an intermediate representation (IR). This IR has the important property of being source language and target processor independent, which allows for generalized optimizations. This flexibility, however, also discards some of the high-level properties of the source language. In this paper we present an analysis that can extract most of the control flow structures typically found in the C programming language from a medium level IR. Mirtoc is an implementation of this analysis for the specific case of CCMIR, the IR used in ACE's CoSy® compiler framework. A compiler based on mirtoc is able to emit C code that is well structured, readable by a human and can be compiled by a back end compiler with relatively low overhead. This enables the use of optimizers based on medium level IRs in a source-to-source flow.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122832424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems 嵌入式系统中指令TLB功耗降低的基于边界的过程放置
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811215
Reiley Jeyapaul, Aviral Shrivastava
{"title":"B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems","authors":"Reiley Jeyapaul, Aviral Shrivastava","doi":"10.1145/1811212.1811215","DOIUrl":"https://doi.org/10.1145/1811212.1811215","url":null,"abstract":"High performance embedded processors are equipped with the Translation Look-aside Buffer (TLB) which forms the key ingredient to efficient and speedy virtual memory management. The TLB though small, is frequently accessed, and therefore not only consumes significant energy, but also is one of the important thermal hot-spots in the processor. Among the many circuit and microarchitectural techniques proposed to reduce TLB power consumption, the Use-Last TLB is one very efficient technique in which power is consumed only when different pages are accessed in succession, i.e., when there is a page-switch [26]. Though the Use-Last technique is effective in reducing i-TLB power, there is scope to further improve its effectiveness by changing the relative code placement of the program. In this work, we formulate the code placement problem to minimize the page-switches in a program. We prove that this problem is NP-complete and propose an efficient Bounds Based Procedure Placement (B2P2) heuristic to efficiently reduce the program's page-switches. Our procedure placement technique delivers an average of 76% reduction in the instrucion-TLB power with negligible (< 2%) impact on performance, over and above the reduction achieved by the Use-Last TLB architecture alone.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130234007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Supporting islands of coherency for highly-parallel embedded architectures using compile-time virtualisation 支持使用编译时虚拟化的高度并行嵌入式架构的一致性孤岛
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811223
Ian Gray, N. Audsley
{"title":"Supporting islands of coherency for highly-parallel embedded architectures using compile-time virtualisation","authors":"Ian Gray, N. Audsley","doi":"10.1145/1811212.1811223","DOIUrl":"https://doi.org/10.1145/1811212.1811223","url":null,"abstract":"As their complexity grows, the architectures of embedded systems are becoming increasingly parallel. However, the frameworks used to assist development on highly-parallel general-purpose systems (such as CORBA or MPI) are too heavyweight for use on the non-standard architectures of embedded systems. They introduce significant overheads due to the lack of architectural and structural information contained within most programming languages. Specifically, thread migration across irregular architectures can lead to very poor memory access times, and unconstrained cache coherency cannot scale to cope with large systems.\u0000 This paper introduces an approach to solving these problems in a scalable way with minimal run-time overhead by using the concept of 'Islands of Coherency'. Cooperating threads are grouped into clusters along with the data that they use. These clusters can then be efficiently mapped to the target architecture, utilising migration only in the areas where the programmer explicitly declares it.\u0000 This is supported through the use of an existing technique called Compile-Time Virtualisation (CTV). CTV does not support run-time dynamism, so it is extended to allow the implementation of Islands of Coherency. The presented system is evaluated experimentally through implementation on an FPGA platform. Simulation-based results are also presented that show the potential that this approach has for increasing the performance of future embedded systems.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"503 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125632159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A higher-order extension for imperative synchronous languages 命令式同步语言的高阶扩展
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811222
Eric Vecchié, J. Talpin, Sébastien Boisgérault
{"title":"A higher-order extension for imperative synchronous languages","authors":"Eric Vecchié, J. Talpin, Sébastien Boisgérault","doi":"10.1145/1811212.1811222","DOIUrl":"https://doi.org/10.1145/1811212.1811222","url":null,"abstract":"This article presents the very first effective design of higher-order modules in the synchronous programming language Esterel. Higher-order modules, together with the robust separate compilation scheme that implements it, allow us to address a yet unexplored application spectrum ranging from rapid prototyping of embedded functionality to hot reconfiguration of embedded software within the formal modeling framework of the \"synchronous hypothesis\". While extensions of data-flow synchronous languages had already been proposed for Lustre [11] and Signal [25], the adaptation of similar programming concepts to imperative synchronous frameworks like Esterel has long posed major technical challenges, due to the specificity of its model of computation. We present a framework including a formal semantics, a type system, and a modular code generator, that tackle this challenge. We consider a specific stack-based module call convention and a simple event pooling protocol; in consequence signals can refer to modules and modules can be transmitted and instantiated by referencing a signal. We define a type system that computes the potential emissions of a module and prove it sound. Our type system seamlessly fits an extension of Esterel's constructive semantics with higher-order modules.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127174994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
System level MPSoC design: a bright future for compiler technology? 系统级MPSoC设计:编译器技术的光明未来?
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811225
R. Leupers
{"title":"System level MPSoC design: a bright future for compiler technology?","authors":"R. Leupers","doi":"10.1145/1811212.1811225","DOIUrl":"https://doi.org/10.1145/1811212.1811225","url":null,"abstract":"Looking back at the SCOPES history, compiler research for embedded processors started out in the 1990s with two major ambitions: (1) more architecture aware code optimizations to better support specialized target machines such as DSPs, and (2) higher flexibility to enable compiler retargeting over a wide range of machines. These research efforts have led to numerous results, many of which are part of industrial products today. So, what is left to do in embedded compilers and who -in a world with \"free\" tools like GCC and LLVM- will pay for them? Naturally, the evolution of embedded processor architectures demands for a never-ending stream of code optimization innovations. However, we argue that the current trend towards ESL design of embedded MPSoC platforms opens up the most promising new opportunities for compiler research, going far beyond the obvious problem of sequential code partitioning. Increasingly complex software stacks, consolidation of the MPSoC platform market, and higher design abstraction levels induce many interesting novel compiler technology use cases, some of which will be highlighted in this presentation.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127016304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel copy motion 平行复制运动
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811214
Florent Bouchez, Quentin Colombet, A. Darte, F. Rastello, C. Guillon
{"title":"Parallel copy motion","authors":"Florent Bouchez, Quentin Colombet, A. Darte, F. Rastello, C. Guillon","doi":"10.1145/1811212.1811214","DOIUrl":"https://doi.org/10.1145/1811212.1811214","url":null,"abstract":"Recent results on the static single assignment (SSA) form open promising directions for the design of register allocation heuristics for just-in-time (JIT) compilation. In particular, tree-scan allocators with two decoupled phases, one for spilling and one for splitting/coloring/coalescing, seem good candidates for designing fast, memory-friendly, and competitive register allocators. Linear-scan allocators, introduced earlier, are also well-suited for JIT compilation. All do live-range splitting (mostly on control-flow edges) to avoid spilling but most of them perform coalescing poorly, leading to many register-to-register copies inside basic blocks, but also, implicitly, on the control-flow graph edges, leading to edge splitting.\u0000 This paper presents parallel copy motion, a technique for optimizing register-allocated codes, which amounts to moving a group of parallel copy instructions from a program point to another. While the scheduling is shackled by data dependencies, a copy can \"traverse\" all instructions of a basic block, thanks to register renaming, except those with conflicting naming constraints. Also, with an adequate management of compensation code, parallel copies can also be moved across edges. A first application is reducing the cost of copies by a better placement. A second application is moving copies out of critical edges, i.e., edges going from a block with multiple successors to a block with multiple predecessors. This is often beneficial compared to the alternative: splitting the edge. A direct use case is the handling of control-flow graphs with non-splittable edges, introduced by some compilers for specific architectural constraints, region boundaries, or exception handling code.\u0000 Experiments with the SPECint and our own benchmarks suite show that an SSA-based register allocator can be applied broadly now, even for procedures with non-splittable edges: while those procedures could not be compiled before, with parallel copy motion, all moves could be pushed out of such edges. Even simple strategies for moving copies out of edges and inside basic blocks show some average improvement compared to the standard edge-splitting strategy (3% speedup), with a great reduction of the weighted number of copies (21% move cost reduction for SPECint). This lets us believe that the approach is promising, and not only for improving coalescing in fast register allocators.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133366080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Interval analysis of microcontroller code using abstract interpretation of hardware and software 区间分析单片机代码,采用抽象的硬件和软件解释
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811216
Jörg Brauer, T. Noll, Bastian Schlich
{"title":"Interval analysis of microcontroller code using abstract interpretation of hardware and software","authors":"Jörg Brauer, T. Noll, Bastian Schlich","doi":"10.1145/1811212.1811216","DOIUrl":"https://doi.org/10.1145/1811212.1811216","url":null,"abstract":"Static analysis is often performed on source code where intervals -- possibly the most widely used numeric abstract domain -- have successfully been used as a program abstraction for decades. Binary code on microcontroller platforms, however, is different from high-level code in that data is frequently altered using bitwise operations and the results of operations often depend on the hardware configuration. We describe a method that combines word- and bit-level interval analysis and integrates a hardware model by means of abstract interpretation in order to handle these peculiarities. Moreover, we show that this method proves powerful enough to derive invariants that could so far only be verified using computationally more expensive techniques such as model checking.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"577 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134271990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining 支持使用决策树进行数据挖掘的特定领域编译器优化开发的工作负载特性
Software and Compilers for Embedded Systems Pub Date : 2010-06-28 DOI: 10.1145/1811212.1811219
Damon Fenacci, Björn Franke, John Thomson
{"title":"Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining","authors":"Damon Fenacci, Björn Franke, John Thomson","doi":"10.1145/1811212.1811219","DOIUrl":"https://doi.org/10.1145/1811212.1811219","url":null,"abstract":"Embedded systems have successfully entered a broad variety of application domains such as automotive and industrial control, telecommunications, networking, digital media, consumer equipment, office automation and many more. In this paper we investigate if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compilers. We develop an automated approach that is capable of identifying domain-specific workload characterizations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarize key resource utilization issues and enable compiler engineers to address the highlighted bottlenecks. We have evaluated our methodology against the industrial EEMBC benchmark suite and three popular embedded processors and have found that workload profiles differ significantly between application domains. We demonstrate that these characteristics can be exploited for the development of domain-specific compiler optimizations. In a case study we show average performance improvements of up to 44% for a class of networking applications.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信