2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)最新文献_第2页

Scrubbing unit repositioning for fast error repair in FPGAs fpga中用于快速错误修复的洗涤单元重新定位

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662506

G. Nazar, Leonardo P. Santos, L. Carro

引用次数: 14

Fault detection and recovery efficiency co-optimization through compile-time analysis and runtime adaptation 通过编译时分析和运行时适应，共同优化故障检测和恢复效率

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662528

Hao Chen, Chengmo Yang

{"title":"Fault detection and recovery efficiency co-optimization through compile-time analysis and runtime adaptation","authors":"Hao Chen, Chengmo Yang","doi":"10.1109/CASES.2013.6662528","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662528","url":null,"abstract":"The ever scaling-down feature size and noise margin keep elevating hardware failure rates, requiring the incorporation of fault tolerance into computer systems. One fault tolerance scheme that receives a lot of research attention is redundant execution. However, existing solutions are developed under the assumption that the fault rate is low. These techniques either solely focus on fault detection, or sometimes even increase recovery cost to reduce fault detection overhead. The lack of overall efficiency makes them insufficient and inappropriate for embedded systems with tight energy and cost budget. Our study shows that checkpoint frequency and fault rate are two critical parameters determining the overall fault detection and recovery overhead. To co-optimize detection and recovery, we statically construct a mathematical model, capable of taking application and architecture characteristics into consideration and identifying the optimal checkpoint frequency of an application for a given fault rate. Moreover, as the fault rate is infeasible to predict a priori, we furthermore propose a set of heuristics, enabling the system to dynamically monitor the fault rate and adapt the checkpoint frequency accordingly. The efficacy of the static and the adaptive optimizations is evaluated through detailed instructionlevel simulation. The results show that the optimal checkpoint frequency identified by the static model is very close to the actual value (6% deviation) and the run-time adaptation scheme effectively reduces the overhead caused by the unpredictability in fault rate.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116012695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Platform-dependent code generation for embedded real-time software 嵌入式实时软件的平台相关代码生成

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/cases.2013.6662512

Baekgyu Kim, L. T. Phan, O. Sokolsky, Insup Lee

{"title":"Platform-dependent code generation for embedded real-time software","authors":"Baekgyu Kim, L. T. Phan, O. Sokolsky, Insup Lee","doi":"10.1109/cases.2013.6662512","DOIUrl":"https://doi.org/10.1109/cases.2013.6662512","url":null,"abstract":"Code generation for embedded systems is challenging, since the generated code (e.g., C code) is expected to run on a heterogeneous set of target platforms with different characteristics, such as hardware/software architectures and programming interfaces. We propose a code generation framework that provides the flexibility to generate different source code that is executable on each target platform. In our framework, the platform-dependent characteristics of a target platform are explicitly specified by an Architectural Analysis Description Language (AADL) model and a code snippet repository. The AADL model captures hardware/software architectural aspects of the platform, such as periodic/aperiodic threads and their interactions with sensors and actuators. The code snippet repository contains platform-dependent code snippets that are categorized according to the functions required to implement the components of the AADL model. These two elements of the platform capability are then used by the code generation algorithm to generate platform-dependent code for the given platform. We demonstrate the applicability of our framework using a case study of code generation for two infusion pump systems.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114959647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors CAeSaR:集群VLIW处理器的统一集群分配调度和通信重用

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662513

Vasileios Porpodas, Marcelo H. Cintra

{"title":"CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors","authors":"Vasileios Porpodas, Marcelo H. Cintra","doi":"10.1109/CASES.2013.6662513","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662513","url":null,"abstract":"Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and clustered, require specialized code generation techniques, as they require explicit Inter-Cluster Copy instructions (ICCs) be scheduled in the code stream. In this work we propose CAeSaR, a novel instruction scheduling algorithm that improves code generation for such architectures. It combines cluster assignment, instruction scheduling and inter-cluster communication reuse all in one single unified algorithm. The proposed algorithm improves performance by any phase-ordering issues among these three code generation and optimization steps. We evaluate CAeSaR on the MediabenchII and SPEC CINT2000 benchmarks and compare it against the state-of-the-art instruction scheduling algorithm. Our results show an improvement in execution time of up to 20.3%, and 13.8% on average, over the current state-of-the-art across the benchmarks.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Dynamic hardware specialization-using moore's bounty without burning the chip down 动态硬件专门化——在不烧毁芯片的情况下使用摩尔定律

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662522

K. Sankaralingam

{"title":"Dynamic hardware specialization-using moore's bounty without burning the chip down","authors":"K. Sankaralingam","doi":"10.1109/CASES.2013.6662522","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662522","url":null,"abstract":"Summary form only given. The era of faster, smaller, greener (more power efficient) transistors in every successive generation appears to be dead. Due to slowing voltage scaling power has becoming a primary design constraint. Using conventional microprocessor techniques does not provide performance improvements without excessive power consumption. Instead, processor architects and microarchitects are going to be partially burdened with power-efficiently and energy-efficiently improving performance with technology scaling providing density improvements “alone”. The DySER project investigates ways for dynamically specializing datapaths to energy-efficiently improve performance. DySER attempts to provide a truly general purpose accelerator, avoiding radical changes to software development, ISA, or microarchitecture. The DySER accelerator is based on three principles: i) Exploit frequently executed, specializable code regions. ii) Dynamically configure the DySER accelerator hardware for particular regions. iii) Integrate the accelerator tightly, but non-intrusively, to a processor pipeline.We have completed a full prototype implementation of DySER integrated into the OpenSPARC processor (called SPARCDySER), a co-designed compiler in LLVM, and a detailed performance evaluation on an FPGA system, which runs an Ubuntu Linux distribution and full applications. Through the prototype, we evaluate the fundamental principles of DySER acceleration, namely: exploiting specializable regions, dynamically specializing hardware, and tight processor integration. To this end, we explore the accelerator's performance, power, and area, and consider comparisons to state-of-the-art microprocessors using energy/performance frontier analysis of both the prototype and simulated DySERaccelerated cores. Compared to the OpenSPARC processor, DySER provides 6.2X performance improvements and 4X energy reduction. DySER's approach of dynamic specialization is a promising way to address the imminent power challenges.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124798600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches 利用阶段间的相互依赖来实现更快的迭代编译器优化阶段顺序搜索

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662511

Michael R. Jantz, P. Kulkarni

{"title":"Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches","authors":"Michael R. Jantz, P. Kulkarni","doi":"10.1109/CASES.2013.6662511","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662511","url":null,"abstract":"The problem of finding the most effective set and ordering of optimization phases to generate the best quality code is a fundamental issue in compiler optimization research. Unfortunately, the exorbitantly large phase order search spaces in current compilers make both exhaustive as well as heuristic approaches to search for the ideal optimization phase combination impractical in most cases. In this paper we show that one important reason existing search techniques are so expensive is because they make no attempt to exploit well-known independence relationships between optimization phases to reduce the search space, and correspondingly improve the search time. We explore the impact of two complementary techniques to prune typical phase order search spaces. Our first technique studies the effect of implicit application of cleanup phases, while the other partitions the set of phases into mutually independent groups and develops new multi-stage search algorithms that substantially reduce the search time with no effect on best delivered code performance. Together, our techniques prune the exhaustive phase order search space size by 89%, on average, (96.75% total search space reduction) and show immense potential at making iterative phase order searches more feasible and practical. The pruned search space enables us to find a small set of distinct phase sequences that reach near-optimal phase ordering performance for all our benchmark functions as well as to improve the behavior of our genetic algorithm based heuristic search.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124397269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator 在静态二进制翻译器中有效地发现ARM/Thumb混合ISA二进制文件

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662525

Jiunn-Yeu Chen, Bor-Yeh Shen, Q. Ou, Wuu Yang, W. Hsu

{"title":"Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator","authors":"Jiunn-Yeu Chen, Bor-Yeh Shen, Q. Ou, Wuu Yang, W. Hsu","doi":"10.1109/CASES.2013.6662525","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662525","url":null,"abstract":"Code discovery has been a main challenge for static binary translation, especially when the source ISA (Instruction Set Architecture) has variable-length instructions, such as the X86 architectures. Due to embedded data such as PC-relative data, jump tables, or paddings in the code section, a binary translator may be misled to translate data as instructions. With variable length instructions, once data is mis-translated as instructions, subsequent decoding of instructions could be wrong. This paper concerns static binary translation for the ARM architectures, which dominate the embedded-system market. Although ARM is considered RISC (Reduced Instruction Set Computing) in many aspects of processors, it does allow the mix of 32-bit instructions (ARM) with 16-bit instructions (Thumb) in the ARM/Thumb mixed executables. Since the instruction lengths of ARM and Thumb are not equal, the locations of the instructions could be 4-byte or 2-byte aligned addresses, respectively. Furthermore, because ARM and Thumb instructions share encoding space, a 4-byte word could be decoded as one ARM instruction or two Thumb instructions. The correct decoding of this 4-byte word is actually determined at run time by the least significant bit of the program counter. For unstripped binaries, mapping symbols can be used to identify ARM code regions and Thumb code regions. However, for stripped binaries, such mapping symbols are not available to assist translation. We have proposed a novel solution to statically translate the stripped executables for the ARM/Thumb mixed ISA. Our static binary translator includes a translation pass which guarantees the correctness of the translated executable by generating multiple versions of translated code for runtime selection. The binary translator also includes a series of optimization analyses which discover and remove most of the code generated in the baseline translation. Based on the SPEC2006 benchmark suite, stripped ARM/Thumb mixed binaries translated by our static binary translator achieve good performance with only 25% of code size increase.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127320265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems 可靠可重构多处理器系统的可感知老化的软硬件任务划分

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662505

Anup Das, Akash Kumar, B. Veeravalli

{"title":"Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems","authors":"Anup Das, Akash Kumar, B. Veeravalli","doi":"10.1109/CASES.2013.6662505","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662505","url":null,"abstract":"Homogeneous multiprocessor systems with reconfigurable area (also known as Reconfigurable Multiprocessor Systems) are emerging as a popular design choice in current and future technology nodes to meet the heterogeneous computing demand of a multitude of applications enabled on these platforms. Application specific mapping decisions on such a platform involve partitioning a given application into software tasks (executed on one or more of the general purpose processors, GPPs) and the hardware tasks (realized as dedicated hardware on the reconfigurable area) to optimize and/or satisfy design constraints such as reliability, performance and design cost. Improving the reliability considering transient faults by increasing the number of checkpoints negatively impacts the reliability considering permanent faults. This trade-off is ignored in all prior studies on task mapping and scheduling. This paper proposes an optimization technique to decide the optimal number of checkpoints for the software tasks which minimizes aging of the GPPs while maximizing the transient fault-tolerance of the overall platform (GPPs and the reconfigurable area) and satisfying design cost and performance. Experiments conducted with synthetic and real-life application task graphs (cyclic and acyclic) demonstrate that the proposed technique minimizes aging and improves the platform lifetime by an average 60% as compared to the existing transient fault-aware techniques. Further, a gradient-based heuristic is proposed to minimize the design space exploration time by upto 500× with less than 5% deviation from optimal solution.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114736305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Compiled multithreaded data paths on FPGAs for dynamic workloads 为动态工作负载在fpga上编译多线程数据路径

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662507

R. Halstead, W. Najjar

引用次数: 22

EVA: An efficient vision architecture for mobile systems EVA:移动系统的高效视觉架构

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) Pub Date : 2013-09-29 DOI: 10.1109/CASES.2013.6662517

Jason Clemons, Andrea Pellegrini, S. Savarese, T. Austin

引用次数: 12