2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)最新文献

筛选
英文 中文
FastFwd: An efficient hardware acceleration technique for trace-driven network-on-chip simulation FastFwd:一种有效的硬件加速技术,用于跟踪驱动的片上网络仿真
G. Krishnaiah, B. Silpa, P. Panda, Anshul Kumar
{"title":"FastFwd: An efficient hardware acceleration technique for trace-driven network-on-chip simulation","authors":"G. Krishnaiah, B. Silpa, P. Panda, Anshul Kumar","doi":"10.1145/1878961.1879006","DOIUrl":"https://doi.org/10.1145/1878961.1879006","url":null,"abstract":"We present an efficient emulation-based technique to accelerate architecture exploration of networks-on-chip (NoCs). The large design space of NoC along with its growing complexity that results in low simulation speeds on host machines have motivated the need for hardware accelerators for speeding up the simulation. For example, simulation of applications with real life problem sizes could take weeks on a host machine. FPGA acceleration is a promising strategy for speeding up NoC simulations by several orders of magnitude. However, it is required to simulate a few billion network transactions of the application during NoC exploration, and this could still take tens of minutes even with an FPGA-based emulator. With the increasing complexity of architectures and applications, reducing emulation time is a key concern. We propose a technique, FastFwd, to minimize emulation time by efficiently identifying and eliminating redundant cycles during a trace-based NoC simulation. We have studied the implications of the additional FPGA hardware required for implementing our technique. A naïve implementation could lead to poor scalability and increase the required DRAM bandwidth, both of which ultimately impact the emulation speed negatively. We propose a hierarchical controller architecture to resolve the scalability issue, and a compressed representation of traces for mitigating the increased DRAM bandwidth requirement. Our experiments with several benchmarks have shown that the FPGA emulation with our technique reduces the average emulation time by a factor of 2 when compared to a conventional emulation.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114918058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploring programming model-driven QoS support for NoC-based platforms 探索基于noc平台的编程模型驱动的QoS支持
Jaume Joven, A. Marongiu, F. Angiolini, L. Benini, G. Micheli
{"title":"Exploring programming model-driven QoS support for NoC-based platforms","authors":"Jaume Joven, A. Marongiu, F. Angiolini, L. Benini, G. Micheli","doi":"10.1145/1878961.1878977","DOIUrl":"https://doi.org/10.1145/1878961.1878977","url":null,"abstract":"Networks-on-Chip (NoCs) are being increasingly considered as a central enabling technology to communication-centric designs as more and more IP blocks are integrated on the same SoC. Embedded applications, in turn, are becoming extremely sophisticated, and often require guaranteed levels of service and performance. The complex and non-uniform nature of network traffic generated by parallel applications running on a large number of possibly heterogeneous IPs makes a strong case for providing Quality of Service (QoS) support for traffic streams over the NoC infrastructure. In this paper we consider an integrated hardware/software approach for delivering QoS at the application level. We designed NoC hardware support, low-level middleware and APIs which enable QoS control at the application level. Furthermore, we identify a set of programming abstractions useful to associate the notion of priority to each running task in the system. An initial implementation of this programming model is also presented, which leverages a set of extensions to a MPSoC-specific OpenMP compiler and run-time environment.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124238360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
High durability in NAND flash memory through effective page reuse mechanisms 通过有效的页面重用机制实现NAND闪存的高耐用性
Kwangyoon Lee, A. Orailoglu
{"title":"High durability in NAND flash memory through effective page reuse mechanisms","authors":"Kwangyoon Lee, A. Orailoglu","doi":"10.1145/1878961.1878999","DOIUrl":"https://doi.org/10.1145/1878961.1878999","url":null,"abstract":"In this paper, we introduce a highly effective page reuse mechanism to reduce the amount of block erasures and page programming in NAND based primary memory architectures. The proposed techniques provide a very high rate of page reuse by effectively incorporating bit differences in page updates along with a reduction in bit unprogrammability by minimizing programming interference among adjacent pages. We also propose an effective block reclamation scheme to alleviate overall programming stress in a block so as to reduce the probability of run-time cell defects. The page reordering scheme can further increase page reusability by reducing run-time programming disturbance. The experimental results show that our proposed techniques significantly diminish the amount of block reclamation and consequently enhance the durability of the NAND flash based storage systems. Furthermore, by alleviating overall bit stress in NAND flash memory, the probability of bit failure of each cell is also significantly reduced, enabling the construction of more reliable and durable NAND flash based memory.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133877566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NeuroNoC: Neural network inspired runtime adaptation for an on-chip communication architecture NeuroNoC:受神经网络启发的芯片上通信架构的运行时适应
T. Ebi, M. A. Faruque, J. Henkel
{"title":"NeuroNoC: Neural network inspired runtime adaptation for an on-chip communication architecture","authors":"T. Ebi, M. A. Faruque, J. Henkel","doi":"10.1145/1878961.1879002","DOIUrl":"https://doi.org/10.1145/1878961.1879002","url":null,"abstract":"The on-chip communication architecture presented in this paper, NeuroNoC, addresses the problems arising in large multi-core systems where global or local routing strategies do not work efficiently anymore since they either do not scale or lack information on the network state. Our communication architecture is runtime adaptive and it deploys a distributed artificial neural network to aid routing decisions. It thereby provides a light-weight mechanism for local routing information to propagate through the communication architecture and is capable of self-organizing efficiently (since scalable) to varying communication workload scenarios. The underlying basic concepts are borrowed from spiking neural networks, a special case of artificial neural networks. Our experiments show that already with low hardware overhead, a significant improvement of the runtime routing behavior compared to current state-of-the-art approaches is possible. We report an improvement of 23% in routing quality compared to wXY routing in terms of failed transactions.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134510992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An introduction to the SystemC synthesis subset standard SystemC综合子集标准的介绍
P. Coussy, A. Takach, Michael McNamara, M. Meredith
{"title":"An introduction to the SystemC synthesis subset standard","authors":"P. Coussy, A. Takach, Michael McNamara, M. Meredith","doi":"10.1145/1878961.1878993","DOIUrl":"https://doi.org/10.1145/1878961.1878993","url":null,"abstract":"High-level synthesis (HLS) offers the prospect of improving the productivity digital system design and the quality of the resulting implementations. Designing at higher levels of abstraction is a natural way for coping with system design complexity, for verifying earlier in the design process and for increasing design reuse. OSCI's synthesis working group (SWG) has led the effort of defining the synthesis subset for SystemC that is suitable for HLS. Draft version 1.3 of the document was released for public review in August 2009. While still in draft form, the released document provides guidance to both tool providers and users on the subset that is being proposed and the ability to provide feedback to the SWG on the draft. This tutorial will provide a brief introduction and three case studies on the use of HLS and the current SystemC synthesis subset draft for hardware design of digital systems.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134544381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Embedded market: Challenges and opportunities 嵌入式市场:挑战与机遇
V. Ilderem
{"title":"Embedded market: Challenges and opportunities","authors":"V. Ilderem","doi":"10.1145/1878961.1878963","DOIUrl":"https://doi.org/10.1145/1878961.1878963","url":null,"abstract":"There is a convergence trend in the computing, communication and consumer markets and with a forecast of an additional 1 billion connected computing users by 2015, it is of high value to provide a common experience between the devices. Intel's vision of Compute Continuum will enable the users to realize the potential of a seamless cross-device experience with more consistency and accessibility to their information. The convergence trend and the Compute Continuum make System-on-Chip [SoC] a key ingredient for the embedded markets. At Intel Labs, we are focusing on delivering differentiating technology solutions to enable our business partners to successfully capture their targeted market segments. We are working on a variety of research that will enable modular system architecture and silicon technology breakthroughs for rapid customization and integration facilitating faster time-to-market. Intel's vision along with some technology challenges and possible solutions will be highlighted.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"40 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123406979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unconventional fabrics, architectures, and models for future multi-core systems 未来多核系统的非常规结构、架构和模型
R. Marculescu, C. Teuscher, P. Pande
{"title":"Unconventional fabrics, architectures, and models for future multi-core systems","authors":"R. Marculescu, C. Teuscher, P. Pande","doi":"10.1145/1878961.1879017","DOIUrl":"https://doi.org/10.1145/1878961.1879017","url":null,"abstract":"Massive level of integration is making modern multi-core chips all-pervasive in several domains. Hence, high performance, robustness, and low power are crucial for the widespread adoption of such platforms. However, achieving all these goals forces us to re-think the basis of designing multi-core systems at nanoscale, starting with the very substrate we need to use to implement such systems in the future, particularly for nanowire (or carbon nanotube) based on-chip interconnect obtained through self-assembly techniques. Due to the lack of control over these processes, such interconnects are expected to be largely unstructured. While large unstructured networks are easy to fabricate, they require unconventional architectures and communication paradigms. For instance, by getting inspiration from many natural systems with network-based architectures, the future multi-core systems at nanoscale are expected to be hierarchical and heterogeneous in nature, as many powerful features such as increased performance, better resource utilization, and an increased robustness against failures of many natural networks come precisely from their heterogeneity, unstructuredness, and hierarchical nature. As such, an important performance limitation of multi-core chips designed with regular network architectures arises from planar metal interconnect-based multi-hop links, where the data transfer between two distant blocks can cause high latency and power consumption.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131600321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming 使用分层任务图和整数线性规划的嵌入式软件自动并行化
D. Cordes, P. Marwedel, A. Mallik
{"title":"Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming","authors":"D. Cordes, P. Marwedel, A. Mallik","doi":"10.1145/1878961.1879009","DOIUrl":"https://doi.org/10.1145/1878961.1879009","url":null,"abstract":"The last years have shown that there is no way to disregard the advantages provided by multiprocessor System-on-Chip (MPSoC) architectures in the embedded systems domain. Using multiple cores in a single system enables to close the gap between energy consumption, problems concerning heat dissipation, and computational power. Nevertheless, these benefits do not come for free. New challenges arise, if existing applications have to be ported to these multiprocessor platforms. One of the most ambitious tasks is to extract efficient parallelism from these existing sequential applications. Hence, many parallelization tools have been developed, most of them are extracting as much parallelism as possible, which is in general not the best choice for embedded systems with their limitations in hardware and software support. In contrast to previous approaches, we present a new automatic parallelization tool, tailored to the particular requirements of the resource constrained embedded systems. Therefore, this paper presents an algorithm which automatically steers the granularity of the generated tasks, with respect to architectural requirements and the overall execution time reduction. For this purpose, we exploit hierarchical task graphs to simplify a new integer linear programming based approach in order to split up sequential programs in an efficient way. Results on real-life benchmarks have shown that the presented approach is able to speed sequential applications up by a factor of up to 3.7 on a four core MPSoC architecture.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131048195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
An elastic software cache with fast prefetching for Motion Compensation in video decoding 一种用于视频解码中运动补偿的快速预取弹性软件缓存
P. Chao, Y. Lin
{"title":"An elastic software cache with fast prefetching for Motion Compensation in video decoding","authors":"P. Chao, Y. Lin","doi":"10.1145/1878961.1878967","DOIUrl":"https://doi.org/10.1145/1878961.1878967","url":null,"abstract":"Real-time decoding of ultrahigh resolution video using multicore architectures is important for future embedded systems. However, memory bandwidth is still a bottleneck of system performance. Video coding performs irregular DRAM access resulting in very low and unstable efficiency. The conventional cache approach is insufficient because it reduces only the redundant accesses to data that has already been fetched during prior-macroblock decoding. We present an Elastic Software Cache (ESC) for ultrahigh resolution video decoding on Scratchpad Memory (SPM)-based systems. Utilizing access region analysis, our latency-optimized prefetching scheme rearranges accesses to minimize both data redundancy and DRAM access latency. Compared to the conventional cache approach, our scheme requires only 4.6 Kbytes of SPM space but it can save up to 25% of memory access cycles resulting in both higher performance and lower power.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A performance model and code overlay generator for scratchpad enhanced embedded processors 一种用于刮记板增强嵌入式处理器的性能模型和代码覆盖生成器
Michael A. Baker, Amrit Panda, Nikhil Ghadge, Aniruddha Kadne, Karam S. Chatha
{"title":"A performance model and code overlay generator for scratchpad enhanced embedded processors","authors":"Michael A. Baker, Amrit Panda, Nikhil Ghadge, Aniruddha Kadne, Karam S. Chatha","doi":"10.1145/1878961.1879011","DOIUrl":"https://doi.org/10.1145/1878961.1879011","url":null,"abstract":"Software managed scratchpad memories (SPMs) provide improved performance and power in embedded processors by reducing required hardware resources. Performance depends strongly on the scheme used to map code and data onto the SPM, but generating optimal mappings can be extremely difficult. Here we address instruction mapping on SPMs and present a performance model and algorithm, “Code Overlay Generator” (COG), for producing high performance dynamic SPM code mappings. Our heuristic does not require profiling information, and is suitable for generating mapping solutions for large programs which are otherwise infeasible using previously proposed Integer Linear Programming (ILP) techniques. We compare our algorithm with a published heuristic and the code overlay mapping algorithm provided with the Cell Broadband Engine (CBE) Synergistic Processing Unit (SPU) compiler from IBM, spu-gcc. We find an average performance advantage of 34% compared to the previous algorithm, and 87% with respect to spu-gcc. We additionally show that our performance model enables improved tools for offline evaluation of code overlay performance and mapping selection.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125343188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信