2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)最新文献_第2页

FastFwd: An efficient hardware acceleration technique for trace-driven network-on-chip simulation FastFwd:一种有效的硬件加速技术，用于跟踪驱动的片上网络仿真

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879006

G. Krishnaiah, B. Silpa, P. Panda, Anshul Kumar

{"title":"FastFwd: An efficient hardware acceleration technique for trace-driven network-on-chip simulation","authors":"G. Krishnaiah, B. Silpa, P. Panda, Anshul Kumar","doi":"10.1145/1878961.1879006","DOIUrl":"https://doi.org/10.1145/1878961.1879006","url":null,"abstract":"We present an efficient emulation-based technique to accelerate architecture exploration of networks-on-chip (NoCs). The large design space of NoC along with its growing complexity that results in low simulation speeds on host machines have motivated the need for hardware accelerators for speeding up the simulation. For example, simulation of applications with real life problem sizes could take weeks on a host machine. FPGA acceleration is a promising strategy for speeding up NoC simulations by several orders of magnitude. However, it is required to simulate a few billion network transactions of the application during NoC exploration, and this could still take tens of minutes even with an FPGA-based emulator. With the increasing complexity of architectures and applications, reducing emulation time is a key concern. We propose a technique, FastFwd, to minimize emulation time by efficiently identifying and eliminating redundant cycles during a trace-based NoC simulation. We have studied the implications of the additional FPGA hardware required for implementing our technique. A naïve implementation could lead to poor scalability and increase the required DRAM bandwidth, both of which ultimately impact the emulation speed negatively. We propose a hierarchical controller architecture to resolve the scalability issue, and a compressed representation of traces for mitigating the increased DRAM bandwidth requirement. Our experiments with several benchmarks have shown that the FPGA emulation with our technique reduces the average emulation time by a factor of 2 when compared to a conventional emulation.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114918058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Exploring programming model-driven QoS support for NoC-based platforms 探索基于noc平台的编程模型驱动的QoS支持

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878977

Jaume Joven, A. Marongiu, F. Angiolini, L. Benini, G. Micheli

{"title":"Exploring programming model-driven QoS support for NoC-based platforms","authors":"Jaume Joven, A. Marongiu, F. Angiolini, L. Benini, G. Micheli","doi":"10.1145/1878961.1878977","DOIUrl":"https://doi.org/10.1145/1878961.1878977","url":null,"abstract":"Networks-on-Chip (NoCs) are being increasingly considered as a central enabling technology to communication-centric designs as more and more IP blocks are integrated on the same SoC. Embedded applications, in turn, are becoming extremely sophisticated, and often require guaranteed levels of service and performance. The complex and non-uniform nature of network traffic generated by parallel applications running on a large number of possibly heterogeneous IPs makes a strong case for providing Quality of Service (QoS) support for traffic streams over the NoC infrastructure. In this paper we consider an integrated hardware/software approach for delivering QoS at the application level. We designed NoC hardware support, low-level middleware and APIs which enable QoS control at the application level. Furthermore, we identify a set of programming abstractions useful to associate the notion of priority to each running task in the system. An initial implementation of this programming model is also presented, which leverages a set of extensions to a MPSoC-specific OpenMP compiler and run-time environment.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124238360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

High durability in NAND flash memory through effective page reuse mechanisms 通过有效的页面重用机制实现NAND闪存的高耐用性

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878999

Kwangyoon Lee, A. Orailoglu

引用次数: 1

NeuroNoC: Neural network inspired runtime adaptation for an on-chip communication architecture NeuroNoC:受神经网络启发的芯片上通信架构的运行时适应

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879002

T. Ebi, M. A. Faruque, J. Henkel

引用次数: 4

An introduction to the SystemC synthesis subset standard SystemC综合子集标准的介绍

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878993

P. Coussy, A. Takach, Michael McNamara, M. Meredith

引用次数: 8

Embedded market: Challenges and opportunities 嵌入式市场:挑战与机遇

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878963

V. Ilderem

引用次数: 0

Unconventional fabrics, architectures, and models for future multi-core systems 未来多核系统的非常规结构、架构和模型

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879017

R. Marculescu, C. Teuscher, P. Pande

{"title":"Unconventional fabrics, architectures, and models for future multi-core systems","authors":"R. Marculescu, C. Teuscher, P. Pande","doi":"10.1145/1878961.1879017","DOIUrl":"https://doi.org/10.1145/1878961.1879017","url":null,"abstract":"Massive level of integration is making modern multi-core chips all-pervasive in several domains. Hence, high performance, robustness, and low power are crucial for the widespread adoption of such platforms. However, achieving all these goals forces us to re-think the basis of designing multi-core systems at nanoscale, starting with the very substrate we need to use to implement such systems in the future, particularly for nanowire (or carbon nanotube) based on-chip interconnect obtained through self-assembly techniques. Due to the lack of control over these processes, such interconnects are expected to be largely unstructured. While large unstructured networks are easy to fabricate, they require unconventional architectures and communication paradigms. For instance, by getting inspiration from many natural systems with network-based architectures, the future multi-core systems at nanoscale are expected to be hierarchical and heterogeneous in nature, as many powerful features such as increased performance, better resource utilization, and an increased robustness against failures of many natural networks come precisely from their heterogeneity, unstructuredness, and hierarchical nature. As such, an important performance limitation of multi-core chips designed with regular network architectures arises from planar metal interconnect-based multi-hop links, where the data transfer between two distant blocks can cause high latency and power consumption.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131600321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming 使用分层任务图和整数线性规划的嵌入式软件自动并行化

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879009

D. Cordes, P. Marwedel, A. Mallik

{"title":"Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming","authors":"D. Cordes, P. Marwedel, A. Mallik","doi":"10.1145/1878961.1879009","DOIUrl":"https://doi.org/10.1145/1878961.1879009","url":null,"abstract":"The last years have shown that there is no way to disregard the advantages provided by multiprocessor System-on-Chip (MPSoC) architectures in the embedded systems domain. Using multiple cores in a single system enables to close the gap between energy consumption, problems concerning heat dissipation, and computational power. Nevertheless, these benefits do not come for free. New challenges arise, if existing applications have to be ported to these multiprocessor platforms. One of the most ambitious tasks is to extract efficient parallelism from these existing sequential applications. Hence, many parallelization tools have been developed, most of them are extracting as much parallelism as possible, which is in general not the best choice for embedded systems with their limitations in hardware and software support. In contrast to previous approaches, we present a new automatic parallelization tool, tailored to the particular requirements of the resource constrained embedded systems. Therefore, this paper presents an algorithm which automatically steers the granularity of the generated tasks, with respect to architectural requirements and the overall execution time reduction. For this purpose, we exploit hierarchical task graphs to simplify a new integer linear programming based approach in order to split up sequential programs in an efficient way. Results on real-life benchmarks have shown that the presented approach is able to speed sequential applications up by a factor of up to 3.7 on a four core MPSoC architecture.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131048195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

An elastic software cache with fast prefetching for Motion Compensation in video decoding 一种用于视频解码中运动补偿的快速预取弹性软件缓存

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1878967

P. Chao, Y. Lin

引用次数: 3

A performance model and code overlay generator for scratchpad enhanced embedded processors 一种用于刮记板增强嵌入式处理器的性能模型和代码覆盖生成器

2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2010-10-24 DOI: 10.1145/1878961.1879011

Michael A. Baker, Amrit Panda, Nikhil Ghadge, Aniruddha Kadne, Karam S. Chatha

{"title":"A performance model and code overlay generator for scratchpad enhanced embedded processors","authors":"Michael A. Baker, Amrit Panda, Nikhil Ghadge, Aniruddha Kadne, Karam S. Chatha","doi":"10.1145/1878961.1879011","DOIUrl":"https://doi.org/10.1145/1878961.1879011","url":null,"abstract":"Software managed scratchpad memories (SPMs) provide improved performance and power in embedded processors by reducing required hardware resources. Performance depends strongly on the scheme used to map code and data onto the SPM, but generating optimal mappings can be extremely difficult. Here we address instruction mapping on SPMs and present a performance model and algorithm, “Code Overlay Generator” (COG), for producing high performance dynamic SPM code mappings. Our heuristic does not require profiling information, and is suitable for generating mapping solutions for large programs which are otherwise infeasible using previously proposed Integer Linear Programming (ILP) techniques. We compare our algorithm with a published heuristic and the code overlay mapping algorithm provided with the Cell Broadband Engine (CBE) Synergistic Processing Unit (SPU) compiler from IBM, spu-gcc. We find an average performance advantage of 34% compared to the previous algorithm, and 87% with respect to spu-gcc. We additionally show that our performance model enables improved tools for offline evaluation of code overlay performance and mapping selection.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125343188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21