2008 IEEE International Conference on Computer Design最新文献

筛选
英文 中文
Analysis and minimization of practical energy in 45nm subthreshold logic circuits 45nm亚阈值逻辑电路中实际能量的分析与最小化
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751876
D. Bol, R. Ambroise, D. Flandre, J. Legat
{"title":"Analysis and minimization of practical energy in 45nm subthreshold logic circuits","authors":"D. Bol, R. Ambroise, D. Flandre, J. Legat","doi":"10.1109/ICCD.2008.4751876","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751876","url":null,"abstract":"Over the last decade, the design of ultra-low-power digital circuits in subthreshold regime has been driven by the quest for minimum energy per operation. In this contribution, we observe that operating at minimum-energy point is not straightforward as design constraints from real-life applications have an important impact on energy. Therefore, we introduce the alternative concept of practical energy, taking functional-yield and throughput constraints on minimum Vdd into account. In this context, we demonstrate for the first time the detrimental impact of DIBL on minimum Vdd. Practical energy gives a useful analysis framework of circuit optimization to reach minimum-energy point, while considering the throughput as an input variable dictated by the application. From simulation of a benchmark multiplier in 45 nm technology, we find out that practical energy can be far higher than minimum energy point, in the case of low-throughput applications (ap 10-100 kOp/s) because of static leakage energy and robustness-limited minimum Vdd. With the proposed framework, we investigate the capability of conventional optimization techniques to make practical energy meet minimum energy point. Amongst these techniques, channel length upsize is shown to be more efficient than MTCMOS power gating, body biasing, Vt selection or device width upsize, as it increases robustness while simultaneously reducing static leakage energy. A small length upsize with low area overhead is shown to reduce practical energy at low throughput to less than 2.1 times the minimum energy level. At medium throughput, it even brings practical energy 30% lower than minimum energy level without optimization techniques.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133756057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Power switch characterization for fine-grained dynamic voltage scaling 细粒度动态电压缩放的功率开关特性
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751923
Liang Di, M. Putic, J. Lach, B. Calhoun
{"title":"Power switch characterization for fine-grained dynamic voltage scaling","authors":"Liang Di, M. Putic, J. Lach, B. Calhoun","doi":"10.1109/ICCD.2008.4751923","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751923","url":null,"abstract":"Dynamic voltage scaling (DVS) provides power savings for systems with varying performance requirements. One low overhead implementation of DVS uses PMOS power switches to connect DVS blocks to one of the available VDD supplies. While power switches have been analyzed extensively for leakage power gating, proper design of power switches for DVS is less well understood. This paper characterizes power switches for DVS in terms of VDD-switching delay and VDD-switching energy. We show the impact of these switching overheads on a novel fine-grained DVS architecture and present an RC model that allows fast estimation of the overhead. Measurements of a DVS multiplier and adder on a 90 nm CMOS test chip validate the model. Our model and measurements confirm that power switched DVS can provide sufficiently low overhead to give energy savings with only one clock cycle spent at a lower voltage, making this approach a flexible and enticing option for embedded portable systems.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128079974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A high-performance parallel CAVLC encoder on a fine-grained many-core system 基于细粒度多核系统的高性能并行CAVLC编码器
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751869
Zhibin Xiao, B. Baas
{"title":"A high-performance parallel CAVLC encoder on a fine-grained many-core system","authors":"Zhibin Xiao, B. Baas","doi":"10.1109/ICCD.2008.4751869","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751869","url":null,"abstract":"This paper presents a high-performance parallel context-based adaptive length coding (CAVLC) encoder implemented on a fine-grained many-core system. The software encoder is designed for a H.264/AVC baseline profile encoder. By utilizing arithmetic table elimination and compression techniques, the data-flow of the CAVLC encoder has been partitioned and mapped to an array of 15 small processors. The parallel workload of each processor is characterized and balanced for further throughput optimization. The proposed parallel CAVLC encoder achieves the real-time processing requirement of 30 frames per second for 720 p HDTV. Our experiments show that the presented CAVLC encoder has 4.86 to 6.83 times higher throughput and requires far smaller chip area than the identical encoder implemented on state-of-art general-purpose processors. In comparison to published implementations on common DSP processors, the design has approximately 1.0 to 6.15 times higher throughput while requiring less than 6 times smaller area.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123303639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A parallel Steiner tree heuristic for macro cell routing 宏单元路由的并行Steiner树启发式算法
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751836
C. Fobel, G. Grewal
{"title":"A parallel Steiner tree heuristic for macro cell routing","authors":"C. Fobel, G. Grewal","doi":"10.1109/ICCD.2008.4751836","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751836","url":null,"abstract":"Global routing of macro cells remains an important but time-consuming step in the VLSI design cycle. Macro cells are large, irregularly sized parameterized circuit modules that typically contain large numbers of terminals that must be interconnected. The interconnection pattern for each set of terminals (net) that must be connected is a Steiner tree, and the primary sub-problem in the global routing of macro cells is to find a set of dissimilar, low-cost Steiner trees for each net that must be routed. In this paper, a two-phase, parallel (multi-processor) algorithm is proposed for quickly constructing a diverse pool of high-quality Steiner trees for routing of multi-terminal nets. In the first phase, a single Steiner tree is constructed using a heuristic, called Shrubbery. Then, in the second phase, a pool of dissimilar, high-quality trees are created from the original tree, by running multiple instances of a local search in parallel. Computational experiments performed on over 800 commonly used benchmarks show that running multiple instances of the local search in parallel results in near-linear speed-up over the serial case. Most importantly, the trees produced are both high-quality and dissimilar, allowing for numerous routing possibilities for each net.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124779796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A novel, highly SEU tolerant digital circuit design approach 一种新颖的、高度容限的数字电路设计方法
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751834
Rajesh Garg, S. Khatri
{"title":"A novel, highly SEU tolerant digital circuit design approach","authors":"Rajesh Garg, S. Khatri","doi":"10.1109/ICCD.2008.4751834","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751834","url":null,"abstract":"In this paper, we present a new radiation tolerant CMOS standard cell library, and demonstrate its effectiveness in implementing radiation hardened digital circuits. We exploit the fact that if a gate is implemented using only PMOS (NMOS) transistors then a radiation particle strike can result only in logic a 0 to 1 (1 to 0) flip. Based on this observation, we derive our radiation hardened gates from regular static CMOS gates. In particular, we separate the PMOS and NMOS devices, and split the gate output into two signals. One of these outputs of our radiation tolerant gate is generated using PMOS transistors, and it drives other PMOS transistors (only) in its fanout. Similarly, the other output (generated from NMOS transistors) drives only other NMOS transistors in its fanout. Now, if a radiation particle strikes one of the outputs of the radiation tolerant gate, then the gates in the fanout enter a high-impedance state, and hence preserve their output values. Our radiation hardened gates exhibit an extremely high degree of SEU tolerance, which is validated at the circuit level. Using these gates, we also implement circuit level hardening based on logical masking, to selectively harden those gates in a circuit which contribute most to the soft error failure of the circuit. The gates with a low probability of logical masking are replaced by SEU tolerant gates from our new library, such that the digital design achieves a 90% soft error rate reduction. Experimental results demonstrate that this reduction is achieved with a modest layout area and delay penalty of 62% and 29% respectively, for area mapped designs. In contrast with existing approaches, our approach results in SEU immunity for extremely large critical charge values (>650fC).","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120961689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Understanding performance, power and energy behavior in asymmetric multiprocessors 了解非对称多处理器的性能、功耗和能源行为
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751903
Nagesh B. Lakshminarayana, Hyesoon Kim
{"title":"Understanding performance, power and energy behavior in asymmetric multiprocessors","authors":"Nagesh B. Lakshminarayana, Hyesoon Kim","doi":"10.1109/ICCD.2008.4751903","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751903","url":null,"abstract":"Multiprocessor architectures are becoming popular in both desktop and mobile processors. Among multiprocessor architectures, asymmetric architectures show promise in saving energy and power. However, the performance and energy consumption behavior of asymmetric multiprocessors with desktop-oriented multithreaded applications has not been studied widely. In this study, we measure performance and power consumption in asymmetric and symmetric multiprocessors using real 8 and 16 processor systems to understand the relationships between thread interactions and performance/power behavior. We find that when the workload is asymmetric, using an asymmetric multiprocessor can save energy, but for most of the symmetric workloads, using a symmetric multiprocessor (with the highest clock frequency) consumes less energy.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Configurable rectilinear Steiner tree construction for SoC and nano technologies 可配置的线性斯坦纳树结构的SoC和纳米技术
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751837
I. Jiang, Yen-Ting Yu
{"title":"Configurable rectilinear Steiner tree construction for SoC and nano technologies","authors":"I. Jiang, Yen-Ting Yu","doi":"10.1109/ICCD.2008.4751837","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751837","url":null,"abstract":"The rectilinear Steiner minimal tree (RSMT) problem is essential in physical design. Moreover, the variant constraints for fabrication issues, including obstacle avoidance, multiple routing layers, layer-specific routing directions, cannot be ignored during RSMT construction for modern SoC and nano technologies. This paper proposes a construction-by-correction approach for obstacle-avoiding preferred direction rectilinear Steiner tree construction. Experimental results show that our algorithm is promising and outperforms the state-of-the-art works.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127255182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications 一个可扩展的FFT体系结构家族和用于实时通信的1024点基数-2 FFT实现
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751880
A. Suleiman, H. Saleh, A. Hussein, D. Akopian
{"title":"A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications","authors":"A. Suleiman, H. Saleh, A. Hussein, D. Akopian","doi":"10.1109/ICCD.2008.4751880","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751880","url":null,"abstract":"The paper presents a family of architectures for FFT implementation based on the decomposition of the perfect shuffle permutation, which can be designed with variable number of processing elements. This provides designers with a trade-off choice of speed vs. complexity (cost and area.). A detailed case study is provided on the implementation of 1024-point FFT with 2 processing elements using 45 nm process technology, including area, timing, power and place-and-route results.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Timing analysis considering IR drop waveforms in power gating designs 功率门控设计中考虑红外降波的时序分析
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751912
Shih-Hung Weng, Yu-Min Kuo, Shih-Chieh Chang, M. Marek-Sadowska
{"title":"Timing analysis considering IR drop waveforms in power gating designs","authors":"Shih-Hung Weng, Yu-Min Kuo, Shih-Chieh Chang, M. Marek-Sadowska","doi":"10.1109/ICCD.2008.4751912","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751912","url":null,"abstract":"IR drop noise has become a critical issue in advanced process technologies. Traditionally, timing analysis in which the IR drop noise is considered assumes a worst-case IR drop for each gate; however, using this assumption provides unduly pessimistic results. In this paper, we describe a timing analysis approach for power gating designs. To improve the accuracy of the gate delay calculation we determine the virtual voltage level by taking into account the IR drop waveforms across the sleep transistors. These can be obtained efficiently using a linear programming approach. Our experimental results are very promising.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126456009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing data sharing and address translation for the Cell BE Heterogeneous Chip Multiprocessor 优化Cell BE异构芯片多处理器的数据共享和地址转换
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751904
M. Gschwind
{"title":"Optimizing data sharing and address translation for the Cell BE Heterogeneous Chip Multiprocessor","authors":"M. Gschwind","doi":"10.1109/ICCD.2008.4751904","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751904","url":null,"abstract":"Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125674744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信