2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)最新文献

筛选
英文 中文
Scrubbing unit repositioning for fast error repair in FPGAs fpga中用于快速错误修复的洗涤单元重新定位
G. Nazar, Leonardo P. Santos, L. Carro
{"title":"Scrubbing unit repositioning for fast error repair in FPGAs","authors":"G. Nazar, Leonardo P. Santos, L. Carro","doi":"10.1109/CASES.2013.6662506","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662506","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are very successful platforms that rely on large configuration memories to store the circuit functions required by users. Faults affecting such memories are a major dependability threat for these devices, and the applicability of FPGAs on critical systems depends on efficient means to mitigate their effects. The main means to effectively remove such faults, namely configuration scrubbing, consists in rewriting the desired contents of this memory and suffers from high power consumption and a long mean time to repair (MTTR). In this work we propose Scrubbing Unit Repositioning for Fast Error Repair (SURFER), a novel approach to exploit partial dynamic reconfiguration coupled with fine-grained redundancy to greatly reduce the MTTR for FPGAs subject to upsets in their configuration memories.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122000717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Fault detection and recovery efficiency co-optimization through compile-time analysis and runtime adaptation 通过编译时分析和运行时适应,共同优化故障检测和恢复效率
Hao Chen, Chengmo Yang
{"title":"Fault detection and recovery efficiency co-optimization through compile-time analysis and runtime adaptation","authors":"Hao Chen, Chengmo Yang","doi":"10.1109/CASES.2013.6662528","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662528","url":null,"abstract":"The ever scaling-down feature size and noise margin keep elevating hardware failure rates, requiring the incorporation of fault tolerance into computer systems. One fault tolerance scheme that receives a lot of research attention is redundant execution. However, existing solutions are developed under the assumption that the fault rate is low. These techniques either solely focus on fault detection, or sometimes even increase recovery cost to reduce fault detection overhead. The lack of overall efficiency makes them insufficient and inappropriate for embedded systems with tight energy and cost budget. Our study shows that checkpoint frequency and fault rate are two critical parameters determining the overall fault detection and recovery overhead. To co-optimize detection and recovery, we statically construct a mathematical model, capable of taking application and architecture characteristics into consideration and identifying the optimal checkpoint frequency of an application for a given fault rate. Moreover, as the fault rate is infeasible to predict a priori, we furthermore propose a set of heuristics, enabling the system to dynamically monitor the fault rate and adapt the checkpoint frequency accordingly. The efficacy of the static and the adaptive optimizations is evaluated through detailed instructionlevel simulation. The results show that the optimal checkpoint frequency identified by the static model is very close to the actual value (6% deviation) and the run-time adaptation scheme effectively reduces the overhead caused by the unpredictability in fault rate.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116012695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Platform-dependent code generation for embedded real-time software 嵌入式实时软件的平台相关代码生成
Baekgyu Kim, L. T. Phan, O. Sokolsky, Insup Lee
{"title":"Platform-dependent code generation for embedded real-time software","authors":"Baekgyu Kim, L. T. Phan, O. Sokolsky, Insup Lee","doi":"10.1109/cases.2013.6662512","DOIUrl":"https://doi.org/10.1109/cases.2013.6662512","url":null,"abstract":"Code generation for embedded systems is challenging, since the generated code (e.g., C code) is expected to run on a heterogeneous set of target platforms with different characteristics, such as hardware/software architectures and programming interfaces. We propose a code generation framework that provides the flexibility to generate different source code that is executable on each target platform. In our framework, the platform-dependent characteristics of a target platform are explicitly specified by an Architectural Analysis Description Language (AADL) model and a code snippet repository. The AADL model captures hardware/software architectural aspects of the platform, such as periodic/aperiodic threads and their interactions with sensors and actuators. The code snippet repository contains platform-dependent code snippets that are categorized according to the functions required to implement the components of the AADL model. These two elements of the platform capability are then used by the code generation algorithm to generate platform-dependent code for the given platform. We demonstrate the applicability of our framework using a case study of code generation for two infusion pump systems.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114959647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors CAeSaR:集群VLIW处理器的统一集群分配调度和通信重用
Vasileios Porpodas, Marcelo H. Cintra
{"title":"CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors","authors":"Vasileios Porpodas, Marcelo H. Cintra","doi":"10.1109/CASES.2013.6662513","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662513","url":null,"abstract":"Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and clustered, require specialized code generation techniques, as they require explicit Inter-Cluster Copy instructions (ICCs) be scheduled in the code stream. In this work we propose CAeSaR, a novel instruction scheduling algorithm that improves code generation for such architectures. It combines cluster assignment, instruction scheduling and inter-cluster communication reuse all in one single unified algorithm. The proposed algorithm improves performance by any phase-ordering issues among these three code generation and optimization steps. We evaluate CAeSaR on the MediabenchII and SPEC CINT2000 benchmarks and compare it against the state-of-the-art instruction scheduling algorithm. Our results show an improvement in execution time of up to 20.3%, and 13.8% on average, over the current state-of-the-art across the benchmarks.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic hardware specialization-using moore's bounty without burning the chip down 动态硬件专门化——在不烧毁芯片的情况下使用摩尔定律
K. Sankaralingam
{"title":"Dynamic hardware specialization-using moore's bounty without burning the chip down","authors":"K. Sankaralingam","doi":"10.1109/CASES.2013.6662522","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662522","url":null,"abstract":"Summary form only given. The era of faster, smaller, greener (more power efficient) transistors in every successive generation appears to be dead. Due to slowing voltage scaling power has becoming a primary design constraint. Using conventional microprocessor techniques does not provide performance improvements without excessive power consumption. Instead, processor architects and microarchitects are going to be partially burdened with power-efficiently and energy-efficiently improving performance with technology scaling providing density improvements “alone”. The DySER project investigates ways for dynamically specializing datapaths to energy-efficiently improve performance. DySER attempts to provide a truly general purpose accelerator, avoiding radical changes to software development, ISA, or microarchitecture. The DySER accelerator is based on three principles: i) Exploit frequently executed, specializable code regions. ii) Dynamically configure the DySER accelerator hardware for particular regions. iii) Integrate the accelerator tightly, but non-intrusively, to a processor pipeline.We have completed a full prototype implementation of DySER integrated into the OpenSPARC processor (called SPARCDySER), a co-designed compiler in LLVM, and a detailed performance evaluation on an FPGA system, which runs an Ubuntu Linux distribution and full applications. Through the prototype, we evaluate the fundamental principles of DySER acceleration, namely: exploiting specializable regions, dynamically specializing hardware, and tight processor integration. To this end, we explore the accelerator's performance, power, and area, and consider comparisons to state-of-the-art microprocessors using energy/performance frontier analysis of both the prototype and simulated DySERaccelerated cores. Compared to the OpenSPARC processor, DySER provides 6.2X performance improvements and 4X energy reduction. DySER's approach of dynamic specialization is a promising way to address the imminent power challenges.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124798600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches 利用阶段间的相互依赖来实现更快的迭代编译器优化阶段顺序搜索
Michael R. Jantz, P. Kulkarni
{"title":"Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches","authors":"Michael R. Jantz, P. Kulkarni","doi":"10.1109/CASES.2013.6662511","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662511","url":null,"abstract":"The problem of finding the most effective set and ordering of optimization phases to generate the best quality code is a fundamental issue in compiler optimization research. Unfortunately, the exorbitantly large phase order search spaces in current compilers make both exhaustive as well as heuristic approaches to search for the ideal optimization phase combination impractical in most cases. In this paper we show that one important reason existing search techniques are so expensive is because they make no attempt to exploit well-known independence relationships between optimization phases to reduce the search space, and correspondingly improve the search time. We explore the impact of two complementary techniques to prune typical phase order search spaces. Our first technique studies the effect of implicit application of cleanup phases, while the other partitions the set of phases into mutually independent groups and develops new multi-stage search algorithms that substantially reduce the search time with no effect on best delivered code performance. Together, our techniques prune the exhaustive phase order search space size by 89%, on average, (96.75% total search space reduction) and show immense potential at making iterative phase order searches more feasible and practical. The pruned search space enables us to find a small set of distinct phase sequences that reach near-optimal phase ordering performance for all our benchmark functions as well as to improve the behavior of our genetic algorithm based heuristic search.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124397269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator 在静态二进制翻译器中有效地发现ARM/Thumb混合ISA二进制文件
Jiunn-Yeu Chen, Bor-Yeh Shen, Q. Ou, Wuu Yang, W. Hsu
{"title":"Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator","authors":"Jiunn-Yeu Chen, Bor-Yeh Shen, Q. Ou, Wuu Yang, W. Hsu","doi":"10.1109/CASES.2013.6662525","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662525","url":null,"abstract":"Code discovery has been a main challenge for static binary translation, especially when the source ISA (Instruction Set Architecture) has variable-length instructions, such as the X86 architectures. Due to embedded data such as PC-relative data, jump tables, or paddings in the code section, a binary translator may be misled to translate data as instructions. With variable length instructions, once data is mis-translated as instructions, subsequent decoding of instructions could be wrong. This paper concerns static binary translation for the ARM architectures, which dominate the embedded-system market. Although ARM is considered RISC (Reduced Instruction Set Computing) in many aspects of processors, it does allow the mix of 32-bit instructions (ARM) with 16-bit instructions (Thumb) in the ARM/Thumb mixed executables. Since the instruction lengths of ARM and Thumb are not equal, the locations of the instructions could be 4-byte or 2-byte aligned addresses, respectively. Furthermore, because ARM and Thumb instructions share encoding space, a 4-byte word could be decoded as one ARM instruction or two Thumb instructions. The correct decoding of this 4-byte word is actually determined at run time by the least significant bit of the program counter. For unstripped binaries, mapping symbols can be used to identify ARM code regions and Thumb code regions. However, for stripped binaries, such mapping symbols are not available to assist translation. We have proposed a novel solution to statically translate the stripped executables for the ARM/Thumb mixed ISA. Our static binary translator includes a translation pass which guarantees the correctness of the translated executable by generating multiple versions of translated code for runtime selection. The binary translator also includes a series of optimization analyses which discover and remove most of the code generated in the baseline translation. Based on the SPEC2006 benchmark suite, stripped ARM/Thumb mixed binaries translated by our static binary translator achieve good performance with only 25% of code size increase.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127320265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems 可靠可重构多处理器系统的可感知老化的软硬件任务划分
Anup Das, Akash Kumar, B. Veeravalli
{"title":"Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems","authors":"Anup Das, Akash Kumar, B. Veeravalli","doi":"10.1109/CASES.2013.6662505","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662505","url":null,"abstract":"Homogeneous multiprocessor systems with reconfigurable area (also known as Reconfigurable Multiprocessor Systems) are emerging as a popular design choice in current and future technology nodes to meet the heterogeneous computing demand of a multitude of applications enabled on these platforms. Application specific mapping decisions on such a platform involve partitioning a given application into software tasks (executed on one or more of the general purpose processors, GPPs) and the hardware tasks (realized as dedicated hardware on the reconfigurable area) to optimize and/or satisfy design constraints such as reliability, performance and design cost. Improving the reliability considering transient faults by increasing the number of checkpoints negatively impacts the reliability considering permanent faults. This trade-off is ignored in all prior studies on task mapping and scheduling. This paper proposes an optimization technique to decide the optimal number of checkpoints for the software tasks which minimizes aging of the GPPs while maximizing the transient fault-tolerance of the overall platform (GPPs and the reconfigurable area) and satisfying design cost and performance. Experiments conducted with synthetic and real-life application task graphs (cyclic and acyclic) demonstrate that the proposed technique minimizes aging and improves the platform lifetime by an average 60% as compared to the existing transient fault-aware techniques. Further, a gradient-based heuristic is proposed to minimize the design space exploration time by upto 500× with less than 5% deviation from optimal solution.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114736305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Compiled multithreaded data paths on FPGAs for dynamic workloads 为动态工作负载在fpga上编译多线程数据路径
R. Halstead, W. Najjar
{"title":"Compiled multithreaded data paths on FPGAs for dynamic workloads","authors":"R. Halstead, W. Najjar","doi":"10.1109/CASES.2013.6662507","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662507","url":null,"abstract":"Hardware supported multithreading can mask memory latency by switching the execution to ready threads, which is particularly effective on irregular applications. FPGAs provide an opportunity to have multithreaded data paths customized to each individual application. In this paper we describe the compiler generation of these hardware structures from a C subset targeting a Convey HC-2ex machine. We describe how this compilation approach differs from other C to HDL compilers. We use the compiler to generate a multithreaded sparse matrix vector multiplication kernel and compare its performance to existing FPGA, and highly optimized software implementations.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122502114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
EVA: An efficient vision architecture for mobile systems EVA:移动系统的高效视觉架构
Jason Clemons, Andrea Pellegrini, S. Savarese, T. Austin
{"title":"EVA: An efficient vision architecture for mobile systems","authors":"Jason Clemons, Andrea Pellegrini, S. Savarese, T. Austin","doi":"10.1109/CASES.2013.6662517","DOIUrl":"https://doi.org/10.1109/CASES.2013.6662517","url":null,"abstract":"The capabilities of mobile devices have been increasing at a momentous rate. As better processors have merged with capable cameras in mobile systems, the number of computer vision applications has grown rapidly. However, the computational and energy constraints of mobile devices have forced computer vision application developers to sacrifice accuracy for the sake of meeting timing demands. To increase the computational performance of mobile systems we present EVA. EVA is an application-specific heterogeneous multicore having a mix of computationally powerful cores with energy efficient cores. Each core of EVA has computation and memory architectural enhancements tailored to the application traits of vision codes. Using a computer vision benchmarking suite, we evaluate the efficiency and performance of a wide range of EVA designs. We show that EVA can provide speedups of over 9× that of an embedded processor while reducing energy demands by as much as 3×.","PeriodicalId":354180,"journal":{"name":"2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)","volume":"322 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120895502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信