International Conference on Hardware/Software Codesign and System Synthesis最新文献_第2页

A standby-sparing technique with low energy-overhead for fault-tolerant hard real-time systems 用于容错硬实时系统的低能耗备用技术

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629463

A. Ejlali, B. Al-Hashimi, P. Eles

{"title":"A standby-sparing technique with low energy-overhead for fault-tolerant hard real-time systems","authors":"A. Ejlali, B. Al-Hashimi, P. Eles","doi":"10.1145/1629435.1629463","DOIUrl":"https://doi.org/10.1145/1629435.1629463","url":null,"abstract":"Time redundancy (rollback-recovery) and hardware redundancy are commonly used in real-time systems to achieve fault tolerance. From an energy consumption point of view, time redundancy is generally more preferable than hardware redundancy. However, hard real-time systems often use hardware redundancy to meet high reliability requirements of safety-critical applications. In this paper we propose a hardware-redundancy technique with low energy-overhead for hard real-time systems. The proposed technique is based on standby-sparing, where the system is composed of a primary unit and a spare. Through analytical models, we have developed an online energy-management method which uses a slack reclamation scheme to reduce the energy consumption of both the primary and spare units. In this method, dynamic voltage scaling (DVS) is used for the primary unit and dynamic power management (DPM) is used for the spare. We conducted several experiments to compare the proposed system with a fault-tolerant real-time system which uses time redundancy for fault tolerance and DVS with slack reclamation for low energy consumption. The results show that for relaxed time constraints, the proposed system provides up to 24% energy saving as compared to the time-redundancy system. For tight deadlines when the time-redundancy system can tolerate no faults, the proposed system preserves its fault-tolerance but with about 32% more energy consumption.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127072885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

LOP: a novel SRAM-based architecture for low power and high throughput packet classification LOP:一种新颖的基于sram的架构，用于低功耗和高吞吐量数据包分类

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629455

Xin He, Jorgen Peddersen, S. Parameswaran

{"title":"LOP: a novel SRAM-based architecture for low power and high throughput packet classification","authors":"Xin He, Jorgen Peddersen, S. Parameswaran","doi":"10.1145/1629435.1629455","DOIUrl":"https://doi.org/10.1145/1629435.1629455","url":null,"abstract":"Packet classification has become an important problem to solve in modern network processors used in networking embedded systems such as routers. Algorithms for matching incoming packets from the network to pre-defined rules, have been proposed by a number of researchers. Current software-based packet classification techniques have low performance, prompting many researchers to move their focus to new architectures encompassing both software and hardware components. Some of the newer hardware architectures exclusively utilize Ternary Content Addressable Memory (TCAM) to improve the performance of rule matching. However, this results in systems with high power consumption. TCAM consumes a high amount of power due to the fact that it reads the entire memory array during every access, much of which is unnecessary. In this paper, we propose LOP, a novel SRAM-based architecture where incoming packets are compared against parts of all rules simultaneously until a single matching rule is found for the compared bits in the packets. This method LOP significantly reduces power consumption as only a segment of the memory is compared against the incoming packet. Despite the additional time penalty to match a single packet, parallel comparison of multiple packets can improve throughput beyond that of the TCAMapproaches, while consuming significantly low power. Nine different benchmarks were tested in two classification systems, with results showing that LOP architectures provide high lookup rates and high throughput, and low power consumption. Compared with a state-of-the-art TCAM implementation (throughput of 495 Million Search per Second (Msps)) in 65nm CMOS technology, on average, LOP saves 43% of energy consumption with a throughput of 590Msps.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114621289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A DP-network for optimal dynamic routing in network-on-chip 片上网络中最优动态路由的dp网络

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629452

T. Mak, P. Cheung, W. Luk, K. Lam

{"title":"A DP-network for optimal dynamic routing in network-on-chip","authors":"T. Mak, P. Cheung, W. Luk, K. Lam","doi":"10.1145/1629435.1629452","DOIUrl":"https://doi.org/10.1145/1629435.1629452","url":null,"abstract":"Dynamic routing is desirable because of its substantial improvement in communication bandwidth and intelligent adaptation to faulty links and congested traffics. However, implementation of adaptive routing in a network-on-chip (NoC) system is not trivial and further complicated by the requirements of deadlock-free and real-time optimal decision making. In this paper, we present a deadlock-free routing architecture which employs a dynamic programming (DP) network to provide on-the-fly optimal path planning and network monitoring for packet switching. Also, a new routing strategy called k-step look ahead is introduced. This new strategy can substantially reduced the size of routing table and maintain a high quality of adaptation which leads to a scalable dynamic routing solution with minimal hardware overhead. Our results based on a cycle-accurate simulator demonstrate the effectiveness of the DP-network, which outperforms both the deterministic and adaptive routing algorithms in average delay on various traffic scenarios by 22.3%. Moreover, the hardware overhead for DP-network is insignificant based on the results obtained from the hardware implementations.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123405453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

A tuneable software cache coherence protocol for heterogeneous MPSoCs 异构mpsoc的可调软件缓存一致性协议

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629488

Frank E. B. Ophelders, M. Bekooij, H. Corporaal

{"title":"A tuneable software cache coherence protocol for heterogeneous MPSoCs","authors":"Frank E. B. Ophelders, M. Bekooij, H. Corporaal","doi":"10.1145/1629435.1629488","DOIUrl":"https://doi.org/10.1145/1629435.1629488","url":null,"abstract":"In a multiprocessor system-on-chip (MPSoC) private caches introduce the cache coherence problem. Here, we target at heterogeneous MPSoCs with a network-on-chip (NoC). Existing hardware cache coherence protocols are less suitable for MPSoCs because many off-the-shelf processors used in MPSoCs do not support these protocols. Furthermore, these protocols typically rely on global visibility and serialization of writes which does not match well with the parallel point-to-point communication provided by a NoC. Therefore, we propose a software cache coherence protocol, which can be applied in a heterogeneous MPSoC with a NoC. The software cache coherence protocol relies on explicit synchronization in the software. More specifically, caches are guaranteed to be coherent according to the Release Consistency model, on top of which we have implemented the standard Pthreads communication library. Heterogeneous MPSoCs with off-the-shelf processors can easily be supported, because processors are only required to provide cache control operations, e.g., clean and invalidate. All cache coherence operations are interruptible and do not impact the execution of tasks on other processors, therefore this protocol is suitable for predictable MPSoCs. Our software cache coherence protocol is implemented on an ARM926EJ-S MPSoC which is mapped on an FPGA. From experiments we conclude that the protocol overhead is low for the applications taken from the SPLASH-2 benchmark set. For these applications we observed a speedup between 1.89 and 2.01 on the two processor MPSoC.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132543101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Cycle count accurate memory modeling in system level design 系统级设计中的周期计数精确内存建模

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629475

Y. Lo, Mao Lin Li, R. Tsay

引用次数: 8

Minimization of the reconfiguration latency for the mapping of applications on FPGA-based systems 最小化基于fpga的系统上应用程序映射的重新配置延迟

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629480

V. Rana, S. Murali, David Atienza Alonso, M. Santambrogio, L. Benini, D. Sciuto

{"title":"Minimization of the reconfiguration latency for the mapping of applications on FPGA-based systems","authors":"V. Rana, S. Murali, David Atienza Alonso, M. Santambrogio, L. Benini, D. Sciuto","doi":"10.1145/1629435.1629480","DOIUrl":"https://doi.org/10.1145/1629435.1629480","url":null,"abstract":"Field-Programmable Gate Arrays (FPGAs) have become promising mapping fabric for the implementation of System-on-Chip (SoC) platforms, due to their large capacity and their enhanced support for dynamic and partial reconfigurability. Design automation support for partial reconfigurability includes several key challenges. In particular, reconfiguration algorithms need to be developed to effectively exploit the available area and run-time reconfiguration support for instantiating at run-time the hardware components needed to execute multiple applications concurrently. These new algorithms must be able to achieve maximum application execution performance at a minimum reconfiguration overhead.\u0000 In this work, we propose a novel design flow that minimizes the amount of core reconfigurations needed to map multiple applications dynamically (i.e., using run-time reconfiguration) on FPGAs. This new mapping flow features a multi-stage design optimization algorithm that makes it possible to reduce the reconfiguration latency up to 43%, by taking into account the reconfiguration costs and SoC block reuse between the different applications that need to be executed dynamically on the FPGA. Moreover, we show that the proposed multi-stage optimization algorithm explores a large set of mapping trade-offs, by taking into account the traffic flows for each application, the run-time reconfiguration costs and the number of reconfigurable regions available on the FPGA.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133245500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

A scalable parallel H.264 decoder on the cell broadband engine architecture 基于蜂窝宽带引擎架构的可扩展并行H.264解码器

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629484

Michael A. Baker, Pravin Dalale, Karam S. Chatha, S. Vrudhula

引用次数: 22

A variation-tolerant scheduler for better than worst-case behavioral synthesis 一个比最坏情况下的行为综合更好的可变容错调度程序

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629467

J. Cong, Albert Liu, B. Liu

引用次数: 8

An on-chip interconnect and protocol stack for multiple communication paradigms and programming models 片上互连和协议栈，用于多种通信范式和编程模型

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629450

Andreas Hansson, K. Goossens

{"title":"An on-chip interconnect and protocol stack for multiple communication paradigms and programming models","authors":"Andreas Hansson, K. Goossens","doi":"10.1145/1629435.1629450","DOIUrl":"https://doi.org/10.1145/1629435.1629450","url":null,"abstract":"A growing number of applications, with diverse requirements, are integrated on the same System on Chip (SoC) in the form of hardware and software Intellectual Property (IP). The diverse requirements, coupled with the IPs being developed by unrelated design teams, lead to multiple communication paradigms, programming models, and interface protocols that the on-chip interconnect must accommodate.\u0000 Traditionally, on-chip buses offer distributed shared memory communication with established memory-consistency models, but are tightly coupled to a specific interface protocol. On-chip networks, on the other hand, offer layering and interface abstraction, but are centred around point-to-point streaming communication, and do not address issues at the higher layers in the protocol stack, such as memory-consistency models and message-dependent deadlock.\u0000 In this work we introduce an on-chip interconnect and protocol stack that combines streaming and distributed shared memory communication. The proposed interconnect offers an established memory-consistency model and does not restrict any higher-level protocol dependencies. We present the protocol stack and the architectural blocks and quantify the cost, both on the block level and for a complete SoC. For a multi-processor multi-application SoC with multiple communication paradigms and programming models, our proposed interconnect occupies only 4% of the chip area.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"19 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114028686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

On compile-time evaluation of process partitioning transformations for Kahn process networks Kahn过程网络过程划分转换的编译时评估

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629441

Sjoerd Meijer, Hristo Nikolov, T. Stefanov

引用次数: 9