{"title":"Rate-monotonic scheduling for reducing system-wide energy consumption for hard real-time systems","authors":"Linwei Niu","doi":"10.1109/ICCD.2010.5647804","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647804","url":null,"abstract":"In this paper, we present system-wide dynamic scheduling algorithms to reduce the energy consumption by both the core DVS processor and multiple non-DVS peripheral devices for hard real-time systems scheduled with rate-monotonic scheduling (RMS) scheme. In our research, we first present an approach to leverage the use of the critical speed strategy and the traditional DVS strategy based on the job workload to be finished within certain interval. Then dynamic scheduling approaches are proposed in the management of speed determination and device shut-down to reduce the energy at the system level. Compared with existing research, our approach can effectively reduce the overall energy consumption for both CPU and peripheral devices.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116865547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RTOS-aware modeling of embedded hardware/software systems","authors":"Matthias Müller, J. Gerlach, W. Rosenstiel","doi":"10.1109/ICCD.2010.5647795","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647795","url":null,"abstract":"Modern embedded systems such as mobile phones or electronic control units from the automotive domain include a bulk of highly complex and highly interacting functions. Due to several reasons—flexibility and cost effectiveness may be the most important ones—a large and permanently growing part of these functions is implemented in software. This comes along with the demand for more and more processing power, paving the way for multi-core architectures, and widespread use of real-time operating systems. Application software implementation and operating system configuration strongly influence the overall system behavior. Design methodologies for such complex systems, consisting of hardware, software and real-time operating systems, must provide an early, model-based view on the overall system. The approach described in this paper enables automatic generation of system-level models of complex systems from abstract application specifications. Additionally, a compiler-based technique allows automatic calculation of precise software runtime information and annotation of the generated model. The resulting system-level model facilitates early exploration of systems on high level of abstraction, taking into account functional and temporal characteristics of hardware, software and real-time operating system. A key feature of the approach is its high accuracy, which is shown by applying it to an industrial application from the automotive domain.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"76 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120882158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flexible simulation methodology and tool for nanoarray-based architectures","authors":"S. Frache, M. Graziano, M. Zamboni","doi":"10.1109/ICCD.2010.5647586","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647586","url":null,"abstract":"Nanoscale arrays based on nanowires are expected to have a promising future thanks to their amazing density and regularity. Experiments demonstrated the feasibility of this technology and pointed out that accurate reliability analyses should be accomplished to assure proper yield requirements. Due to the complexity of these systems and the arising necessity of thorough fault analysis, design automation tools are mandatory in order to explore architectural solutions and fault tolerant approaches deriving information from reliable nanoarray characterisation.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127340899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms","authors":"Rami Sheikh, Mazen Kharbutli","doi":"10.1109/ICCD.2010.5647594","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647594","url":null,"abstract":"Due to the ever increasing performance gap between the processor and the main memory, it becomes crucial to bridge that gap by designing an efficient memory hierarchy capable of reducing the average memory access time. The cache replacement algorithm plays a central role in designing an efficient memory hierarchy. Many of the recent studies in cache replacement algorithms have focused on improving L2 cache replacement algorithms by minimizing the miss count. However, depending on the dependency chain, cache miss bursts, and other factors, a processor's ability to partially hide the cost of an L2 cache miss varies; that is, cache miss costs are not uniform. Therefore, a better solution would account also for the aggregate miss cost in designing cache replacement algorithms. Our proposed solution combines the two principles of locality and cost-sensitivity into one which we call: LACS: Locality-Aware Cost-Sensitive cache replacement algorithm. LACS estimates a cache block's cost from the number of instructions the processor manages to issue during a cache miss on that block and then victimizes cache blocks with low cost and poor locality in order to maximize the overall cache performance. When LACS is evaluated using a uniprocessor architecture model, it speeds up 10 L2 cache performance-constrained SPEC CPU2000 benchmarks by up to 85% and 15% on average while not slowing down any of the 20 SPEC CPU2000 benchmarks evaluated. When evaluated using a dual-core CMP architecture model, LACS speeds up 6 SPEC CPU2000 benchmark pairs by up to 44% and 11% on average.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122971927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lizard: Energy-efficient hard fault detection, diagnosis and isolation in the ALU","authors":"Seokin Hong, Soontae Kim","doi":"10.1109/ICCD.2010.5647708","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647708","url":null,"abstract":"Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in the ALU might lead to a total failure in the embedded systems. In addition, energy efficiency is critical in these systems. To address these increasingly important problems in the ALU, we propose a novel energy-efficient fault-tolerant ALU design called Lizard. Lizard utilizes two 16-bit ALUs to perform 32-bit computations with fault detection and diagnosis. By exploiting predictable operations, fault detection is performed in a single cycle. The 16-bit ALUs can be partitioned into two 8-bit ALUs. When a fault occurs in one of the four 8-bit ALUs, Lizard diagnoses and isolates a faulty 8-bit ALU for itself. After the faulty 8-bit ALU is isolated, Lizard continues its operation using the remaining three 8-bit ALUs, which can detect and isolate another fault. In this way, Lizard can survive faults on at most two sub-ALUs increasing its lifetime and fault tolerance. We conducted comparative evaluations with an unprotected ALU, triple modular redundancy ALU, and quadruple time redundancy ALU in terms of area, energy consumption, performance, and reliability. It is demonstrated that Lizard outperforms other ALU designs in most cases, especially in energy efficiency.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124716042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meltem Ozsoy, Yusuf Onur Koçberber, M. Kayaalp, O. Ergin
{"title":"Dynamic register file partitioning in superscalar microprocessors for energy efficiency","authors":"Meltem Ozsoy, Yusuf Onur Koçberber, M. Kayaalp, O. Ergin","doi":"10.1109/ICCD.2010.5647631","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647631","url":null,"abstract":"Register file is one of the vital and energy consuming parts inside microprocessor. Many studies show that it is one of the hot spots on the chip. It is also observed by many researchers that many of the produced values in a processor are narrow. By using the narrow values, register files can store fewer bits and may be designed to need less static and dynamic energy. In this paper we propose a register file design that stores data in narrow value groups and values are written to those groups according to their widths. Size of narrow value groups can be set dynamically according to the behavior of the program while having the same performance. We show that the register file which has dynamically changing narrow value groups offers static and dynamic energy savings in the register file up to 65% with negligible performance loss.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116499876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scenario-based design space exploration of MPSoCs","authors":"P. V. Stralen, A. Pimentel","doi":"10.1109/ICCD.2010.5647727","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647727","url":null,"abstract":"Early design space exploration (DSE) is a key ingredient in system-level design of MPSoC-based embedded systems. The state of the art in this field typically still explores systems under a single, fixed application workload. In reality, however, the applications are concurrently executing and contending for system resources in such systems. As a result, the intensity and nature of application demands can change dramatically over time. This paper therefore introduces the concept of workload scenarios in the DSE process, capturing dynamic behavior both within and between applications. More specifically, we present and evaluate a novel, scenario-based DSE approach based on a coevolutionary genetic algorithm.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121654772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical completion detection for 2-of-N delay-insensitive codes","authors":"Marco Cannizzaro, Weiwei Jiang, S. Nowick","doi":"10.1109/ICCD.2010.5647809","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647809","url":null,"abstract":"There is increasing interest in using m-of-n delay-insensitive codes for robust asynchronous global communication, to support the design of coding-efficient and low-power channels. However, a fundamental obstacle in using these codes has been complex and expensive hardware support. This paper addresses this issue, introducing and evaluating practical completion detector units for 2-of-n codes. Designs are proposed for both return-to-zero (RZ) and non-return-to-zero (NRZ) codes. The RZ designs build on prior work of Piestrak [14]; this paper proposes a small modification to their work to provide a fully timing-robust (i.e. quasi-delay insensitive, or QDI) version. The main contribution of the paper is an efficient completion for NRZ 2-of-n codes. Both detector architectures are modular and simple, composed of basic cells in a binary tree. Initial simulation results were performed on several implementations of a 2-of-9 detector using Cadence's Spectre environment, after mapping to a 90nm standard cell library. The new RZ detector has 35% area reduction and comparable delays and energy to the earlier Piestrak design, but unlike the latter, ensures robust QDI operation. The new NRZ detector is shown to have negligible stabilization time between successive codewords (0.05–0.19 ns) when compared to a recent alternative approach.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"55 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132463123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michel Rogers-Vallée, Marc-André Cantin, Laurent Moss, G. Bois
{"title":"IP characterization methodology for fast and accurate power consumption estimation at transactional level model","authors":"Michel Rogers-Vallée, Marc-André Cantin, Laurent Moss, G. Bois","doi":"10.1109/ICCD.2010.5647622","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647622","url":null,"abstract":"Estimating the power consumption of System on Chip as early as possible in the design life cycle is important to meet the time to market requirements. For this purpose, most research is turning toward high-level models, like TLM, to estimate power earlier. This paper presents a high-level IP oriented power estimation methodology. The methodology separates the activity of the IP from the implementation. This allows the ability to easily create a model that can be used with different frequencies, layout and implementation technology. By using data gathered from the RTL a model can be created for high-level simulation that can take into account the technology and characteristics of the FPGA device. The methodology is presented in this paper with a processor and its local memory IP from Xilinx. Compared to estimations made at the RTL level, the resulting model gives accurate results of 15% with three to four order speedups and through different implementations.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"380 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131720049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Damavandpeyma, S. Stuijk, T. Basten, M. Geilen, H. Corporaal
{"title":"Thermal-aware scratchpad memory design and allocation","authors":"M. Damavandpeyma, S. Stuijk, T. Basten, M. Geilen, H. Corporaal","doi":"10.1109/ICCD.2010.5647616","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647616","url":null,"abstract":"Scratchpad memories (SPMs) have become a promising on-chip storage solution for embedded systems from an energy, performance and predictability perspective. The thermal behavior of these types of memories has not been considered in detail. This thermal behavior plays an important role in the reliability of silicon devices and in their static (leakage) power consumption. In this paper, we propose two different techniques to improve the thermal behavior of SPMs. First, we propose a hardware-based, thermal-aware address translation technique that physically distributes memory accesses to consecutive addresses evenly over the whole memory area. Second, we propose a software-based, thermal-aware address generation technique. This technique tries to distribute the variables that are allocated to the SPM in such a way that an even thermal distribution is achieved. The first technique works particularly well for applications with a regular access pattern, whereas the second technique can also improve the behavior of applications with irregular access patterns. The two techniques thus complement each other and work well together. Using the first technique we show that the peak temperature of an SPM in 65nm technology, when running a typical streaming application, is decreased by up-to 10.0°C. Temperature cycling is reduced from up-to 14.8°C to almost zero in comparison with a non-thermal-aware solution. For our benchmark applications with an irregular access pattern, the second technique is able to reduce the peak temperature by up-to 3.5°C. These savings for both techniques are obtained without any performance degradation or extra silicon area.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127418229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}