Software and Compilers for Embedded Systems最新文献

筛选
英文 中文
Efficient event-driven simulation of parallel processor architectures 并行处理器架构的高效事件驱动仿真
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269854
A. Kupriyanov, D. Kissler, Frank Hannig, J. Teich
{"title":"Efficient event-driven simulation of parallel processor architectures","authors":"A. Kupriyanov, D. Kissler, Frank Hannig, J. Teich","doi":"10.1145/1269843.1269854","DOIUrl":"https://doi.org/10.1145/1269843.1269854","url":null,"abstract":"In this paper we present a new approach for generating high-speed optimized event-driven instruction set level simulators for adaptive massively parallel processor architectures. The simulator generator is part of a methodology for the systematic mapping, evaluation, and exploration of massively parallel processor architectures that are designed for special purpose applications in the world of embedded computers. The generation of high-speed cycle-accurate simulators is of utmost importance here, because they are directly used both for parallel processor architecture debugging and evaluation purposes, as well as during time-consuming architecture/compiler co-exploration. We developed a modeling environment which automatically generates a C++ simulation model either from a graphical input or directly from an XML-based architecture description. Here, we focus on the underlying event-driven simulation model and present our modeling environment, in particular the features of the graphical parallel processor architecture editor and the automatic instruction set level simulator generator. Finally, in a case-study, we demonstrate the pertinence of our approach by simulating different processor arrays. The superior performance of the generated simulators compared to existing simulators and simulator generation approaches is shown.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128237649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications 操作系统集成了多进程应用的能量感知刮擦板分配策略
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269850
R. Pyka, Christoph Faßbach, Manish Verma, H. Falk, P. Marwedel
{"title":"Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications","authors":"R. Pyka, Christoph Faßbach, Manish Verma, H. Falk, P. Marwedel","doi":"10.1145/1269843.1269850","DOIUrl":"https://doi.org/10.1145/1269843.1269850","url":null,"abstract":"Various scratchpad allocation strategies have been developed in the past. Most of them target the reduction of energy consumption. These approaches share the necessity of having direct access to the scratchpad memory. In earlier embedded systems this was always true, but with the increasing complexity of tasks systems have to perform, an additional operating system layer between the hardware and the application is becoming mandatory. This paper presents an approach to integrate a scratchpad memory manager into the operating system. The goal is to minimize energy consumption. In contrast to previous work, compile time knowledge about the application's behavior is taken into account. A set of fast heuristic allocation methods is proposed in this paper. An in-depth study and comparison of achieved energy savings and cycle reductions was performed. The results show that even in the highly dynamic environment of an operating system equipped embedded system, up to 83% energy consumption reduction can be achieved.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123871554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Whole-program linear-constant analysis with applications to link-time optimization 整个程序线性常数分析与应用程序链接时间优化
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269853
L. V. Put, Dominique Chanet, K. D. Bosschere
{"title":"Whole-program linear-constant analysis with applications to link-time optimization","authors":"L. V. Put, Dominique Chanet, K. D. Bosschere","doi":"10.1145/1269843.1269853","DOIUrl":"https://doi.org/10.1145/1269843.1269853","url":null,"abstract":"Current link-time optimization techniques can reduce the power consumption and code size of embedded software [2]. Due to a lack of information, the stack frames of procedures are left untouched by link-time program optimizers. In this paper we present a practical whole-program linear-constant analysis [9] that allows to analyze the stack layout of a procedure. The analysis deals with the peculiarities of link-time program representation, namely the lack of high-level information and the huge size of the control flow graph. Even on a complete linux kernel, our analysis is practical in terms of computation time. The collected information consists of restricted affine equations between two registers, but it enables optimizations complementary to existing link-time optimization techniques. On a set of ARM benchmarks, the number of store operations decreases by up to 7% while the execution time, program size and power consumption are all further improved. This paper discusses both the practical issues of applying whole-program linear-constant propagation as well as its use in program optimization and understanding.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131821569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal chain rule placement for instruction selection based on SSA graphs 基于SSA图的指令选择的最优链式规则放置
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269857
Stefan Schäfer, Bernhard Scholz
{"title":"Optimal chain rule placement for instruction selection based on SSA graphs","authors":"Stefan Schäfer, Bernhard Scholz","doi":"10.1145/1269843.1269857","DOIUrl":"https://doi.org/10.1145/1269843.1269857","url":null,"abstract":"Instruction selection is a compiler optimisation that translates the intermediate representation of a program into a lower intermediate representation or an assembler program. We use the SSA form as an intermediate representation for instruction selection. Patterns are used for translation and are expressed as production rules in a graph grammar. The instruction selector seeks for a syntax derivation with minimal costs optimising execution time, code size, or a combination of both. Production rules are either base rules which match nodes in the SSA graph or chain rules which convert results of operations.\u0000 We present a new algorithm for placing chain rules in a control flow graph. This new algorithm places chain rules optimally for an arbitrary cost metric. Experiments with the MiBench and SPEC2000 benchmark suites show that our proposed algorithm is feasible and always yields better results than simple strategies currently in use. We reduce the costs for placing chain rules by 25% for the MiBench suite and by 11% for the SPEC2000 suite.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127994074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC 减少异构MPSoC多线程代码生成中的细粒度通信开销
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269855
L. Brisolara, Sang-Il Han, X. Guerin, L. Carro, R. Reis, S. Chae, A. Jerraya
{"title":"Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC","authors":"L. Brisolara, Sang-Il Han, X. Guerin, L. Carro, R. Reis, S. Chae, A. Jerraya","doi":"10.1145/1269843.1269855","DOIUrl":"https://doi.org/10.1145/1269843.1269855","url":null,"abstract":"Heterogeneous MPSoCs present unique opportunities for emerging embedded applications, which require both high-performance and programmability. Although, software programming for these MPSoC architectures requires tedious and error-prone tasks, thereby automatic code generation tools are required. A code generation method based on fine-grain specification can provide more design space and optimization opportunities, such as exploiting fine-level parallelism and more efficient partitions. However, when partitioned, fine-grain models may require a large number of inter-processor communications, decreasing the overall system performance. This paper presents a Simulink-based multithread code generation method, which applies Message Aggregation optimization technique to reduce the number of inter-processor communications. This technique reduces the communication overheads in terms of execution time by reduction on the number of messages exchanged and in terms of memory size by the reduction on the number of channels. The paper also presents experiment results for one multimedia application, showing performance improvements and memory reduction obtained with Message Aggregation technique.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130158669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Optimization of dynamic data structures in multimedia embedded systems using evolutionary computation 基于进化计算的多媒体嵌入式系统动态数据结构优化
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269849
David Atienza Alonso, Christos Baloukas, Lazaros Papadopoulos, C. Poucet, S. Mamagkakis, J. Hidalgo, F. Catthoor, D. Soudris, J. Lanchares
{"title":"Optimization of dynamic data structures in multimedia embedded systems using evolutionary computation","authors":"David Atienza Alonso, Christos Baloukas, Lazaros Papadopoulos, C. Poucet, S. Mamagkakis, J. Hidalgo, F. Catthoor, D. Soudris, J. Lanchares","doi":"10.1145/1269843.1269849","DOIUrl":"https://doi.org/10.1145/1269843.1269849","url":null,"abstract":"Embedded consumer devices are increasing their capabilities and can now implement new multimedia applications reserved only for powerful desktops a few years ago. These applications share complex and intensive dynamic memory use. Thus, dynamic memory optimizations are a requirement when porting these applications. Within these optimizations, the refinement of the Dynamically (de)allocated Data Type (or DDT) implementations is one of the most important and difficult parts for an efficient mapping onto low-power embedded devices.\u0000 In this paper, we describe a new automatic optimization approach for the DDTs of object-oriented multimedia applications. It is based on an analytical pre-characterization of the possible elementary DDT blocks, and a multi-objective genetic algorithm to explore the design space and to select the best implementation according to different optimization criteria (i.e., memory accesses, memory footprint and energy consumption). Our results in real-life multimedia applications show that the best implementations of DDTs can be obtained in an automated way in few hours, while typically designers would require days to find a suitable implementation, achieving important savings in exploration time with respect to other state-of-the-art heuristics-based optimization methods for this task.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116528573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network processor 将基于流的应用程序自动分区和映射到Intel IXP Network处理器
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269847
Sjoerd Meijer, B. Kienhuis, J. Walters, David Snuijf
{"title":"Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network processor","authors":"Sjoerd Meijer, B. Kienhuis, J. Walters, David Snuijf","doi":"10.1145/1269843.1269847","DOIUrl":"https://doi.org/10.1145/1269843.1269847","url":null,"abstract":"When studying the IXP Network processor architecture from Intel, we found quite some interesting aspects that make the IXP attractive for stream-based applications. The architecture is highly optimized for streaming data, albeit in the form of internet packets. Furthermore, the architecture has Gigabit Ethernet connectors for handling incoming and outgoing traffic and can process this data at real-time using dedicated microengines. In this paper, we try to answer three questions; 1) Can we use the IXP architecture for stream-based applications? 2) can we map applications written as a KPN onto an IXP? 3) can we integrate the generation of KPNs using the Compaan compiler with a tool flow that maps the KPN to an IXP, thereby make the programming of IXP much simpler? As will be shown, all three steps can be performed and we show that we can map automatically two non-internet stream-based applications (QR and DWT) onto the IXP.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Systematic intermediate sequence removal for reduced memory accesses 为了减少内存访问,系统地移除中间序列
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269851
C. Poucet, S. Mamagkakis, David Atienza Alonso, F. Catthoor
{"title":"Systematic intermediate sequence removal for reduced memory accesses","authors":"C. Poucet, S. Mamagkakis, David Atienza Alonso, F. Catthoor","doi":"10.1145/1269843.1269851","DOIUrl":"https://doi.org/10.1145/1269843.1269851","url":null,"abstract":"Modern software applications are growing in complexity and demand very intensive use of data. Therefore, a wide variety of data structures are utilized to facilitate the storage and access to these vast amounts of computed information. Additionally, the need for reliable software design and the development of large applications following the object-oriented paradigm increase the amount of dynamic buffers and redundant accesses to the data stored in these buffers. In this paper, we propose a systematic, design optimization methodology to remove these intermediate dynamic buffers, thereby reducing the memory accesses of the targeted applications without altering the input-output behaviour of the algorithms. The reduction is focused on sequences and is especially relevant for embedded systems, which have limited on-chip communication bandwidth and the energy consumption of the memory subsystem is high, due to the energy consumption associated with each memory access. The effectiveness of the proposed methodology is assessed in a 3D reconstruction multimedia application and shows a significant reduction in memory accesses. In addition, the general trends for memory improvement and the scalability of our approach are supported as well by a parameterized benchmark set.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115549311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interference graphs for procedures in static single information form are interval graphs 静态单信息形式程序的干涉图是区间图
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269858
P. Brisk, M. Sarrafzadeh
{"title":"Interference graphs for procedures in static single information form are interval graphs","authors":"P. Brisk, M. Sarrafzadeh","doi":"10.1145/1269843.1269858","DOIUrl":"https://doi.org/10.1145/1269843.1269858","url":null,"abstract":"Static Single Information (SSI) Form is a compiler intermediate representation that extends the more well-known Static Single Assignment (SSA) Form. In 2005, several research groups independently proved that interference graphs for procedures represented in SSA Form are chordal graphs. This paper performs a similar analysis concerning SSI Form, and proves that interference graphs are interval graphs. The primary consequences of this paper are threefold: (1) Linear scan register allocation for programs in SSI Form can be implemented in such a way that there are no lifetime holes, thereby sidestepping one of the drawbacks that plagued non-SSI implementations; (2) the k-colorable subgraph problem can be solved in polynomial-time for interval graphs, but remains NP-Complete for chordal graphs---to date, no register allocation algorithms have been implemented that solve the k-colorable subgraph problem directly; and (3) liveness analysis converges after a single iteration for programs represented in SSI Form.","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130036975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Improvements to the Psi-SSA representation 对Psi-SSA表示的改进
Software and Compilers for Embedded Systems Pub Date : 2007-04-20 DOI: 10.1145/1269843.1269859
F. D. Ferrière
{"title":"Improvements to the Psi-SSA representation","authors":"F. D. Ferrière","doi":"10.1145/1269843.1269859","DOIUrl":"https://doi.org/10.1145/1269843.1269859","url":null,"abstract":"Modern compiler implementations use the Static Single Assignment representation [5] as a way to efficiently implement optimizing algorithms. However this representation is not well adapted to architectures with a predicated instruction set. The ψ-SSA representation was first introduced in [11] as an extension to the Static Single Assignment representation. The ψ-SSA representation extends the SSA representation such that standard SSA algorithms can be easily adapted to an architecture with a fully predicated instruction set. A new pseudo operation, the ψ operation, is introduced to merge several conditional definitions into a unique definition.\u0000 This paper presents an adaptation of the ψ-SSA representation to support architectures with a partially predicated instruction set. The definition of the ψ operation is extended to support the generation and the optimization of partially predicated code. In particular, a predicate promotion transformation is introduced to reduce the number of predicated operations, as well as the number of operations used to compute guard registers. An out of ψ-SSA algorithm is also described, which fixes and improves the algorithm described in [11]. This algorithm is derived from the out of SSA algorithm from Sreedhar et al. [10], where the definitions of liveness and interferences have been extended for the ψ operations. This algorithm inserts predicated copy operations to restore the correct semantics in the program in a non-SSA form.\u0000 The ψ-SSA representation is used in our production compilers, based on the Open64 technology, for the ST200 family processors. In this compiler, predicated code is generated by an if-conversion algorithm performed under the ψ-SSA representation [12, 1].","PeriodicalId":375451,"journal":{"name":"Software and Compilers for Embedded Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信