International Conference on Hardware/Software Codesign and System Synthesis最新文献_第3页

Applying network calculus for performance analysis of self-similar traffic in on-chip networks 将网络演算应用于片上网络自相似流量的性能分析

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629497

Yue Qian, Zhonghai Lu, Wenhua Dou

{"title":"Applying network calculus for performance analysis of self-similar traffic in on-chip networks","authors":"Yue Qian, Zhonghai Lu, Wenhua Dou","doi":"10.1145/1629435.1629497","DOIUrl":"https://doi.org/10.1145/1629435.1629497","url":null,"abstract":"On-chip traffic of many applications exhibits self-similar characteristics. In this paper, we intend to apply network calculus to analyze the delay and backlog bounds for self-similar traffic in networks on chips. We first prove that self-similar traffic can not be constrained by any deterministic arrival curve. Then we prove that self-similar traffic can be constrained by deterministic linear arrival curves α{r,b}(t)=rt+b (r:rate, b:burstiness) if an additional parameter, excess probability ε, is used to capture its burstiness exceeding the arrival envelope. This three-parameter model, ε-α{r,b}(t)=rt+b(ε), enables us to apply and extend the results of network calculus to analyze the performance and buffering cost of networks delivering self-similar traffic flows. Assuming the latency-rate server model for the network elements, we give closed-form equations to compute the delay and backlog bounds for self-similar traffic traversing a series of network elements. Furthermore, we describe a performance analysis flow with self-similar traffic as input. Our experimental results using real on-chip multimedia traffic traces validate our model and approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131799049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

FlexRay schedule optimization of the static segment FlexRay的静态分段调度优化

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629485

M. Lukasiewycz, M. Glaß, J. Teich, Paul Milbredt

{"title":"FlexRay schedule optimization of the static segment","authors":"M. Lukasiewycz, M. Glaß, J. Teich, Paul Milbredt","doi":"10.1145/1629435.1629485","DOIUrl":"https://doi.org/10.1145/1629435.1629485","url":null,"abstract":"The FlexRay bus is the prospective automotive standard communication system. For the sake of a high exibility, the protocol includes a static time-triggered and a dynamic event-triggered segment. This paper is dedicated to the scheduling of the static segment in compliance with the automotive-specific AUTOSAR standard. For the determination of an optimal schedule in terms of the number of used slots, a fast greedy heuristic as well as a complete approach based on Integer Linear Programming are presented. For this purpose, a scheme for the transformation of the scheduling problem into a bin packing problem is proposed. Moreover, a metric and optimization method for the extensibility of partially used slots is introduced. Finally, the provided experimental results give evidence of the benefits of the proposed methods. On a realistic case study, the proposed methods are capable of obtaining better results in a significantly smaller amount of time compared to a commercial tool. Additionally, the experimental results provide a case study on incremental scheduling, a scalability analysis, an exploration use case, and an additional test case to emphasis the robustness and exibility of the proposed methods.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116446287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 121

FRA: a flash-aware redundancy array of flash storage devices FRA: flash存储设备的flash感知冗余阵列

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629459

Yangsup Lee, Sanghyuk Jung, Y. Song

{"title":"FRA: a flash-aware redundancy array of flash storage devices","authors":"Yangsup Lee, Sanghyuk Jung, Y. Song","doi":"10.1145/1629435.1629459","DOIUrl":"https://doi.org/10.1145/1629435.1629459","url":null,"abstract":"Since flash memory has many attractive characteristics such as high performance, non-volatility, low power consumption and shock resistance, it has been widely used as storage media in the embedded and computer system environments. In the case of reliability, however, there are many shortcomings in flash memory: potentially high I/O latency due to erase-before-write and poor durability due to limited erase cycles. To overcome these problems, a RAID technique borrowed from storage technology based on hard disks is employed. In the RAID technology, multi-bit burst failures in the page, block or device are easily detected and corrected so that the reliability can be significantly enhanced. However the existing RAID-5 scheme for the flash-based storage has delayed response time for parity updating. To overcome this problem, we propose a novel approach using a RAID technique in flash storage, called Flash-aware Redundancy Array. In this approach, parity updates are postponed so that they are not included in the critical path of read and write operations. Instead, they are scheduled for when the device becomes idle. For example, the proposed scheme shows a 19% improvement in the average write response time, compared to other approaches.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122584699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Using binary translation in event driven simulation for fast and flexible MPSoC simulation 在事件驱动仿真中使用二进制转换实现快速灵活的MPSoC仿真

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629446

M. Gligor, Nicolas Fournel, F. Pétrot

引用次数: 73

Scalable and retargetable simulation techniquesfor multiprocessor systems 多处理器系统的可扩展和可重定向仿真技术

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629448

Heekyung Kim, Dukyoung Yun, S. Ha

{"title":"Scalable and retargetable simulation techniquesfor multiprocessor systems","authors":"Heekyung Kim, Dukyoung Yun, S. Ha","doi":"10.1145/1629435.1629448","DOIUrl":"https://doi.org/10.1145/1629435.1629448","url":null,"abstract":"For design space exploration of embedded systems, a virtual prototyping system is commonly used to verify the expected performance as well as functionality before a hardware prototype is built. For accurate performance estimation, a virtual prototyping system is constructed by replacing real processing components with component simulators running concurrently. In such a distributed simulation system, the overhead of communication and synchronization between the component simulators increases in proportion to the number of simulators in case the lock-step synchronization is used. As a result the simulation performance is degraded significantly as the number of processors integrated in a chip increases. To overcome this problem, we propose a scalable and retargetable simulation technique that boosts the simulation performance significantly, by attaching a simulator wrapper to each component simulator. The simulator wrapper performs synchronization on behalf of the associated simulator itself between the simulators and the simulation backplane. Use of the simulator wrapper also makes the proposed simulation platform retargetable since a third-party simulator like ARMulator can be integrated into the simulation environment through a wrapper without modification. In addition, it enables parallel simulation that achieves almost linear speed-up as the number of processor cores increases in the simulation host. Through experiments with multimedia CODEC application and other applications varying the number of processor simulators from 1 to 16, it is proved that the simulation performance remains constant. And scalable performance from parallel simulation is also confirmed by experiments.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131064142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Supporting RTL flow compatibility in a microarchitecture-level design framework 在微架构级设计框架中支持RTL流兼容性

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629482

Daniel Schwartz-Narbonne, C. Chan, Yogesh S. Mahajan, S. Malik

{"title":"Supporting RTL flow compatibility in a microarchitecture-level design framework","authors":"Daniel Schwartz-Narbonne, C. Chan, Yogesh S. Mahajan, S. Malik","doi":"10.1145/1629435.1629482","DOIUrl":"https://doi.org/10.1145/1629435.1629482","url":null,"abstract":"Current RTL-based design methodologies face significant scaling challenges related to the difficulty of designing, modifying, and verifying RTL. RTL contains primarily low level structural information about the design. In contrast, the microarchitecture-level is much closer to the specification level, making it an effective entry point for hardware design. The explicit description of the high-level units of work is also beneficial for verification. Currently used models for high level design have very complex semantics. In this paper, we present a microarchitectural modeling language with simpler semantics. We demonstrate that it results in a significantly simpler synthesis to Verilog, providing for integration with existing RTL flows. Moreover, the simple semantics of the model enable the generation of PSL assertions for functionally verifying correctness of the synthesis. We demonstrate the efficacy of this approach through two case-studies---a router switch and a processor design. We synthesized both designs, and formally verified the synthesis using the generated assertions.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117158107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Automatic customization of device drivers for IP-cores used with assorted CPU organizations 自动定制与各种CPU组织一起使用的ip核的设备驱动程序

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629460

A. Acquaviva, N. Bombieri, F. Fummi, S. Vinco

{"title":"Automatic customization of device drivers for IP-cores used with assorted CPU organizations","authors":"A. Acquaviva, N. Bombieri, F. Fummi, S. Vinco","doi":"10.1145/1629435.1629460","DOIUrl":"https://doi.org/10.1145/1629435.1629460","url":null,"abstract":"Plugging an IP core into an embedded platform implies the generation of a device driver complying with the IP communication protocol from one side and with the CPU organization (i.e., single processor, SMP, AMP) from the other side. Reusing an existent driver developed for a different CPU organization needs a time-consuming and error-prone manual customization of it, that discourages the evaluation of alternative target platform organizations. In this context, the paper firstly proposes to extract the formal model of the IP communication protocol from the RTL testbench provided with it. Then a taxonomy of device drivers is presented based on the CPU organization of the platform. This taxonomy allows to select the correct template used to automatically generate a device driver compliant with the CPU organization, with the use in a simulated or in a real platform, with the interrupt support, with the operating system, with the I/O architecture and with the possible parallel execution. The proposed methodology has been successfully tested on a family of embedded platforms with different CPU organizations.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Bottom-up performance analysis considering time slice based software scheduling at system level 考虑基于时间片的系统级软件调度的自底向上性能分析

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629493

A. Viehl, M. Pressler, O. Bringmann

引用次数: 4

ESL power analysis of embedded processors for temperature and reliability estimations 用于温度和可靠性估计的嵌入式处理器的ESL功率分析

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629469

Björn Sander, Jürgen Schnerr, O. Bringmann

{"title":"ESL power analysis of embedded processors for temperature and reliability estimations","authors":"Björn Sander, Jürgen Schnerr, O. Bringmann","doi":"10.1145/1629435.1629469","DOIUrl":"https://doi.org/10.1145/1629435.1629469","url":null,"abstract":"The ongoing scaling of CMOS technology facilitates the design of systems with continuously increasing functionality but also raises the susceptibility of these systems to reliability issues caused by high power densities and temperatures, respectively. Because of complexity reasons, the Electronic System Level (ESL) is gaining importance as starting point of design. Design alternatives are evaluated at ESL with respect to several design objectives, lately also including temperature. But temperatures are dominated by local power effects - a fact, that has not been sufficiently reflected at ESL until now. There is a lack of appropriate models, which we call ESL Power Density Gap. The contributions of this paper are twofold. First, we describe why the ESL Power Density Gap should be closed. In doing so, we want to stimulate a discussion. After that, we introduce a new ESL methodology for the power analysis of embedded processors, which can be considered as a first step to solve the aforementioned problem. It allows the generation of executable system models from a platform description, combining a functionality representation and component characterizations. Using an example application, it is shown that high power densities, usually invisible at ESL, can be uncovered by applying the proposed approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130520264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

MinDeg: a performance-guided replacement policy for run-time reconfigurable accelerators MinDeg:运行时可重构加速器的性能导向替换策略

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629481

L. Bauer, M. Shafique, J. Henkel

{"title":"MinDeg: a performance-guided replacement policy for run-time reconfigurable accelerators","authors":"L. Bauer, M. Shafique, J. Henkel","doi":"10.1145/1629435.1629481","DOIUrl":"https://doi.org/10.1145/1629435.1629481","url":null,"abstract":"Reconfigurable Processors utilize a reconfigurable fabric (to implement application-specific accelerators) and may perform run-time reconfigurations to exchange the set of deployed accelerators during application execution. Depending on the application requirements, the high utilization of the reconfigurable fabric (due to run-time reconfiguration) leads to a performance improvement compared to non-reconfigurable application-specific processors (ASIPs). However, as the reconfiguration time of fine-grained reconfigurable fabrics (i.e. FPGA-like structures) is rather long (in the range of milliseconds), it is crucial to avoid frequent cycles of reconfiguration-replacement-reconfiguration of the accelerators in order to exploit the real benefits of Reconfigurable Processors. Similar to memory caches, a replacement policy has to decide which reconfigurable accelerators shall be replaced in order to reconfigure additional accelerators. In the case that a recently replaced accelerator is demanded again, the reconfiguration delay might noticeably increase the application execution time.\u0000 In this paper, we demonstrate that well-known policies for cache and page replacement (typically also used in state-of-the-art Reconfigurable Processors) are not generally suitable to replace reconfigurable accelerators.\u0000 We therefore propose our novel performance-guided Minimum Degradation (MinDeg) replacement policy that particularly targets Reconfigurable Processors and replaces reconfigurable accelerators at run time. It accounts for the performance penalty that occurs due to replacement of a certain accelerator. Comparisons with the most-prominent replacement policies show the superiority of our approach. We evaluate and compare MinDeg for a wide range of different reconfiguration bandwidths and reconfigurable fabric sizes and achieve a speedup of up to 2.26x (1.74x compared to the widely used LRU policy). The introduced overhead to achieve this speedup is minor in comparison to the obtained application acceleration, i.e. the highest observed overhead (to calculate our MinDeg replacement policy) affected the obtained application acceleration by only 0.30%. A parallel hardware implementation of our MinDeg algorithm demands only 4,440 gate equivalents, which corresponds to 64% of the average requirements of one real-world reconfigurable accelerator (note: multiple accelerators are demanded per kernel). However, our MinDeg policy does not rely on hardware support, i.e. a trade-off between the hardware requirements and the acceleration is possible.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121159517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8