Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)最新文献

筛选
英文 中文
Data reuse driven energy-aware MPSoC co-synthesis of memory and communication architecture for streaming applications 数据重用驱动的能量感知的MPSoC内存和通信架构的协同合成流应用
I. Issenin, N. Dutt
{"title":"Data reuse driven energy-aware MPSoC co-synthesis of memory and communication architecture for streaming applications","authors":"I. Issenin, N. Dutt","doi":"10.1145/1176254.1176326","DOIUrl":"https://doi.org/10.1145/1176254.1176326","url":null,"abstract":"The memory subsystem of a complex multiprocessor systems- on-chip (MPSoC) is an important contributor to the chip power consumption. The selection of memory architecture, as well as of communication architecture, both affect the power efficiency of the design. In this paper we propose a novel approach that enables energy-aware co-synthesis of both memory and communication architecture for streaming applications. As opposed to earlier techniques, we employ a powerful compile-time analysis of memory access behavior that adds flexibility in selecting memory architectures. Additionally, we target TDMA bus-based communication architectures, which not only guarantee performance, but also greatly reduce the design time and allow us to find the energy optimal system configuration. We propose and compare three techniques: an optimal mixed ILP- based co-synthesis technique, a mixed ILP-based traditional two- step synthesis approach where memory and communication synthesis is performed sequentially, and a co-synthesis heuristic that synthesizes energy-efficient hierarchical bus-based communication architectures with guaranteed throughput. Our experimental results on a number of streaming applications show that both the traditional two-step synthesis approach and heuristic result in up to 50% worse power consumption in comparison with proposed co-synthesis approach. However, on some of the streaming benchmarks, our co-synthesis heuristic approach was able to find optimal or near-optimal results in a much shorter time than the MILP co-synthesis approach.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126741423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Automatic phase detection for stochastic on-chip traffic generation 随机片上流量生成的自动相位检测
A. Scherrer, A. Fraboulet, T. Risset
{"title":"Automatic phase detection for stochastic on-chip traffic generation","authors":"A. Scherrer, A. Fraboulet, T. Risset","doi":"10.1145/1176254.1176277","DOIUrl":"https://doi.org/10.1145/1176254.1176277","url":null,"abstract":"(NoC) prototyping is used for adapting NoC parameters to the application running on the chip. This prototyping is currently done using traffic generators which emulate the SoC components (IPs) behavior: processors, hardware accelerators, etc. Traffic generated by processor-like IPs is highly non-regular, it must be decomposed into program phases. We propose an original feature for NoC prototyping, inspired by techniques used in processor architecture performance evaluation: the automatic detection of traffic phases. Integrated in our NoC prototyping environment, this feature permits to have a completely automatic toolchain for the generation of stochastic traffic generators. We show that our traffic generators emulate precisely the behavior of processors and that our environment is a versatile tool for networks-on-chip prototyping. Simulations are performed in a SystemC-based simulation environment with a mesh network-on-chip (DSPIN) and a processor running MP3 decoding applications.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126971782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Floorplan driven leakage power aware IP-based SoC design space exploration 平面图驱动的泄漏功率感知基于ip的SoC设计空间探索
Aseem Gupta, N. Dutt, F. Kurdahi, K. Khouri, M. Abadir
{"title":"Floorplan driven leakage power aware IP-based SoC design space exploration","authors":"Aseem Gupta, N. Dutt, F. Kurdahi, K. Khouri, M. Abadir","doi":"10.1145/1176254.1176284","DOIUrl":"https://doi.org/10.1145/1176254.1176284","url":null,"abstract":"Multi-million gate system-on-chip (SoC) designs increasingly rely on intellectual property (IP) blocks. However, due to technology scaling the leakage power consumption of the IP blocks has risen thus leading to possible thermal runaway. In IP-based design there has been a disconnect between system level design and physical level steps such as floorplanning which can lead to failures in manufactured chips. This necessitates coupling between system level and physical level design steps. The leakage power of an IP-block increases with its temperature which is dependent on the SoC's floorplan due to thermal diffusion. We have observed that different floorplans of the same SoC can have up to 3X difference in leakage power. Hence the system designer needs to be aware of this design space between floorplans and leakage power. We propose a leakage aware exploration (LAX) framework which enables the system designer to create this design space early in the design cycle and provides an opportunity to make changes in the system design. We show the size of the design space generated by applying LAX on ten industrial SoC designs from Freescale Semiconductor Inc. and observe that the leakage power can vary by as much as 190% for 65% difference in the inactive area.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122889108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A bus architecture for crosstalk elimination in high performance processor design 高性能处理器设计中一种消除串扰的总线结构
W. Hsieh, Po-Yuan Chen, TingTing Hwang
{"title":"A bus architecture for crosstalk elimination in high performance processor design","authors":"W. Hsieh, Po-Yuan Chen, TingTing Hwang","doi":"10.1145/1176254.1176314","DOIUrl":"https://doi.org/10.1145/1176254.1176314","url":null,"abstract":"In deep sub-micron technology, the crosstalk effect between adjacent wires has become an important issue, especially between long on-chip buses. This effect leads to the increase in delay, in power consumption, and in worst case, to incorrect result. In this paper, we propose a de-assembler/assembler structure to eliminate undesirable crosstalk effect on bus transmission. By taking advantage of the prefetch process where the instruction/data fetch rate is always higher than instruction/data commit rate in high performance processors, the proposed method would hardly reduce the performance. In addition, the required number of extra bus wires is only 7 as compared with 85 needed in [6] when the bus width is 128 bits.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123449678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Decision-theoretic exploration of multiProcessor platforms 多处理器平台的决策理论探索
G. Beltrame, Dario Bruschi, D. Sciuto, C. Silvano
{"title":"Decision-theoretic exploration of multiProcessor platforms","authors":"G. Beltrame, Dario Bruschi, D. Sciuto, C. Silvano","doi":"10.1145/1176254.1176305","DOIUrl":"https://doi.org/10.1145/1176254.1176305","url":null,"abstract":"In this paper, we present an efficient technique to perform design space exploration of a multi-processor platform that minimizes the number of simulations needed to identify the power-performance approximate Pareto curve. Instead of using semi-random search algorithms (like simulated annealing, tabu search, genetic algorithms, etc.), we use domain knowledge derived from the platform architecture to set-up exploration as a decision problem. Each action in the decision-theoretic framework corresponds to a change in the platform parameters. Simulation is performed only when information about the probability of action outcomes becomes insufficient for a decision. The algorithm has been tested with two multi-media industrial applications, namely an MPEG4 encoder and an Ogg-Vorbis decoder. Results show that the exploration of the number of processors and two-level cache size and policy, can be performed with less than 15 simulations with 95% accuracy, increasing the exploration speed by one order of magnitude when compared to traditional operation research techniques.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123602050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Retargetable code optimization with SIMD instructions 可重定向代码优化与SIMD指令
M. Hohenauer, Christoph Schumacher, R. Leupers, G. Ascheid, H. Meyr, Hans van Someren
{"title":"Retargetable code optimization with SIMD instructions","authors":"M. Hohenauer, Christoph Schumacher, R. Leupers, G. Ascheid, H. Meyr, Hans van Someren","doi":"10.1145/1176254.1176291","DOIUrl":"https://doi.org/10.1145/1176254.1176291","url":null,"abstract":"Retargetable C compilers are nowadays widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. One frequent concern about retargetable compilers, though, is their lack of machine-specific code optimization techniques in order to achieve highest code quality. While this problem is partially inherent to the retargetable compilation approach, it can be circumvented by designing flexible, configurable code optimization techniques that apply to a certain range of target architectures. This paper focuses on target machines with SIMD instruction support which is widespread in embedded processors for multimedia applications. We present an efficient and quickly retargetable SIMD code optimization technique that is integrated into an industrial retargetable C compiler. Experimental results for the Philips Trimedia processor demonstrate that the proposed technique applies to real-life target machines and that it produces code quality improvements close to the theoretical limit.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Yield prediction for architecture exploration in nanometer technology nodes:: a model and case study for memory organizations 纳米技术节点架构探索的产率预测:存储组织的模型和案例研究
A. Papanikolaou, T. Grabner, M. Corbalan, P. Roussel, F. Catthoor
{"title":"Yield prediction for architecture exploration in nanometer technology nodes:: a model and case study for memory organizations","authors":"A. Papanikolaou, T. Grabner, M. Corbalan, P. Roussel, F. Catthoor","doi":"10.1145/1176254.1176315","DOIUrl":"https://doi.org/10.1145/1176254.1176315","url":null,"abstract":"Process variability has a detrimental impact on the performance of memories and other system components, which can lead to parametric yield loss at the system level due to timing violations. Conventional yield models do not allow to accurately analyze this, at least not at the system level. In this paper we propose a technique to estimate this system level yield loss for a number of alternative memory organization implementations. This can aid the designer into making educated trade-offs at the architecture level between energy consumption and parametric timing yield by using memories from different available libraries with different energy/performance characteristics considering the impact of manufacturing variations. The accuracy of this technique is very high, an average error of less than 1% is reported, which enables an early exploration of the available options.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132861084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cutting across layers of abstraction:: removing obstacles from the advancement of embedded systems 跨越抽象层:消除嵌入式系统发展的障碍
K. Flautner
{"title":"Cutting across layers of abstraction:: removing obstacles from the advancement of embedded systems","authors":"K. Flautner","doi":"10.1145/1176254.1176318","DOIUrl":"https://doi.org/10.1145/1176254.1176318","url":null,"abstract":"Silicon technology evolution over the last four decades has yielded an exponential increase in integration densities with steady improvements of performance and power consumption at each technology generation. This steady progress has created a sense of entitlement for the riches that future process generations would bring. Today, however, classical process scaling seems to be dead and living up to technology expectations requires continuous innovation at many levels, which comes at steadily progressing implementation and design costs. Solutions to problems need to cut across layers of abstractions and require coordination between software, architecture and circuit features. Heterogeneous multiprocessor clusters are increasingly used to deliver the required compute power for high-end applications. Heterogeneity ensures that the necessary processing power can be delivered at high levels of efficiency at reasonable implementation cost, while the use of processors endow these systems with large degrees of flexibility. One of the key challenges with these systems is system-level programming. Traditional compiler technologies are strong at programming individual cores but leave the task of parallelization to a team of experts. The first part of this talk will describe the coupling of the compiler to the system architecture on a multi-core signal-processing cluster and illustrate how compiler technology can enable the writing of portable parallel programs for it using little more than C. As claimed above, close coupling of abstraction layers can be beneficial. This can also be illustrated at the microarchitecture - circuit boundary. The second part of the talk will describe a prototype microarchitecture which is designed explicitly to deal with issues such as silicon variation and soft errors. These features in return enable system designers to focus on the typical-case performance of their implementations without having to be over-constrained by worst-case conditions.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124322636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Demand paging for OneNANDTM Flash eXecute-in-place OneNANDTM Flash就地执行的需求分页
Yongsoo Joo, Yongseok Choi, Chanik Park, S. Chung, Eui-Young Chung, N. Chang
{"title":"Demand paging for OneNANDTM Flash eXecute-in-place","authors":"Yongsoo Joo, Yongseok Choi, Chanik Park, S. Chung, Eui-Young Chung, N. Chang","doi":"10.1145/1176254.1176310","DOIUrl":"https://doi.org/10.1145/1176254.1176310","url":null,"abstract":"NAND flash memory can provide cost-effective secondary storage in mobile embedded systems, but its lack of a random access capability means that code shadowing is generally required, taking up extra RAM space. Demand paging with NAND flash memory has recently been proposed as an alternative which requires less RAM. This scheme is even more attractive for OneNAND flash, which consists of a NAND flash array with SRAM buffers, and supports eXecute-ln-Place (XIP), which allows limited random access to data on the SRAM buffers. We introduce a novel demand paging method for OneNAND flash memory with XIP feature. The proposed on-line demand paging method with XIP adopts finite size sliding window to capture the paging history and thus predict future page demands. We particularly focus on non-critical code accesses which can disturb real-time code. Experimental results show that our method outperforms conventional LRU-based demand paging by 57% in terms of execution time and by 63% in terms of energy consumption. It even beats the optimal solution obtained from MIN, which is a conventional off-line demand paging technique by 30% and 40% respectively.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126197666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Bounded arbitration algorithm for QoS-supported on-chip communication 支持qos的片上通信的有界仲裁算法
M. A. Faruque, Gereon Weiss, J. Henkel
{"title":"Bounded arbitration algorithm for QoS-supported on-chip communication","authors":"M. A. Faruque, Gereon Weiss, J. Henkel","doi":"10.1145/1176254.1176275","DOIUrl":"https://doi.org/10.1145/1176254.1176275","url":null,"abstract":"Time-critical multi-processor systems require guaranteed services in terms of throughput, bandwidth etc. in order to comply to hard real-time constraints. However, guaranteed-service schemes suffer from low resource utilization. To the best of our knowledge, we are presenting the first approach for on-chip communication that provides a high resource utilization under a transaction-specific, flexible (i.e. different classifications on data exchange) communication scheme. It does provide tight time-related guarantees. Hence, we are presenting our bounded arbitration scheme considering lower and upper bounds for each type of transaction level. We demonstrate its advantages by means of a complete MPEG4 decoder case study and achieve under these constraints a bandwidth utilization of up to 100%, on an average 97% with a guaranteed (100%) bandwidth.","PeriodicalId":370841,"journal":{"name":"Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129390713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信