2007 25th International Conference on Computer Design最新文献_第4页

Power efficient register file update approach for embedded processors 嵌入式处理器的节能寄存器文件更新方法

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601935

R. Ayoub, A. Orailoglu

{"title":"Power efficient register file update approach for embedded processors","authors":"R. Ayoub, A. Orailoglu","doi":"10.1109/ICCD.2007.4601935","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601935","url":null,"abstract":"In this paper we present an approach for a low power register file in the domain of embedded processors. The suggested approach obtains power savings through tackling the unnecessary writes to register files for short live registers. Writes to register files are essentially redundant when an instruction manages to forward its results to all of its dependents through forwarding hardware. As the percentage of registers that exhibit short liveness is shown to be significant, tackling unnecessary writes contributes to delivering appreciable power savings. In this work we show that tackling the unnecessary writes could be attained efficiently through a register based encoding scheme. The suggested encoding scheme exploits application-specific information and renames all or most of the short live registers to a small subset of the registers that are prespecified during the hardware design. The renaming process is performed at the compiler level. Power savings can be obtained through precluding the set of prespecified registers from writing to the register file. We suggest in this paper efficient algorithms for the purpose of renaming, one algorithm to perform the renaming in the cases of no register pressure and another one for the cases of register pressure. In the cases of register pressure, some of the prespecified registers may need to be turned into normal registers, a process that is managed through the use of reprogrammable hardware support. Although the cases of register pressure could impact power savings, the detailed analysis we outline shows that the size of the prespecified registers subset is typically small which makes register pressure an infrequent event. Experimental analysis on numerical and DSP codes indicates appreciable improvements in power savings.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"17 1","pages":"431-437"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78386680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS 采用新颖的65nm CMOS开关分配器的4.6Tbits/s 3.6GHz单周期NoC路由器

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601881

A. Kumary, Partha Kunduz, A.P. Singhx, Li-Shiuan Pehy, N. K. Jhay

{"title":"A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS","authors":"A. Kumary, Partha Kunduz, A.P. Singhx, Li-Shiuan Pehy, N. K. Jhay","doi":"10.1109/ICCD.2007.4601881","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601881","url":null,"abstract":"As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65 nm technology. Our design targets an aggressive clock frequency of 3.6 GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm2 area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6 GHz clock frequency while achieving apeak switching data rate in excess of 4.6 Tbits/sper router node.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"163 1","pages":"63-70"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83528352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 212

Optimized design of a double-precision floating-point multiply-add-dused unit for data dependence 基于数据依赖性的双精度浮点乘加单元的优化设计

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601918

Gongqiong Li, Zhaolin Li

引用次数: 0

Placement and routing of RF embedded passive designs in LCP substrate 射频嵌入式无源设计在LCP基板上的放置与布线

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601913

M. Pathak, S. Mukherjee, M. Swaminathan, E. Engin, S. Lim

引用次数: 0

Memory based computation using embedded cache for processor yield and reliability improvement 基于内存的计算采用嵌入式缓存来提高处理器的良率和可靠性

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601922

Somnath Paul, S. Bhunia

引用次数: 1

Prioritizing verification via value-based correctness criticality 通过基于值的正确性关键性来确定验证的优先级

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601921

Joonhyuk Yoo, M. Franklin

{"title":"Prioritizing verification via value-based correctness criticality","authors":"Joonhyuk Yoo, M. Franklin","doi":"10.1109/ICCD.2007.4601921","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601921","url":null,"abstract":"Microprocessors are becoming increasingly susceptible to soft errors due to the current trends of semiconductor technology scaling. Traditional redundant multi-threading architectures provide good fault tolerance by re-executing all the computations. However, such a full re-execution significantly increases the demand on the processor resources, resulting in severe performance degradation. To address this problem, this paper introduces a correctness criticality based filter checker, which prioritizes the verification candidates so as to selectively do verification. Binary Correctness Criticality (BCC) and Likelihood of Correctness Criticality (LoCC) are metrics that quantify whether an instruction is important for reliability or how likely an instruction is correctness-critical, respectively. A likelihood of correctness criticality is computed by a value vulnerability factor, which is defined by the numerically significant bit-width used to compute a result. The proposed technique is accomplished by exploiting information redundancy of compressing computationally useful data bits. Based on the likelihood of correctness criticality test, the filter checker mitigates the verification workload by bypassing instructions that are unimportant for correct execution. Extensive measurements prove that the LoCC metric yields quite a wide distribution of values, indicating that it has the potential to differentiate diverse degrees of correctness criticality. Experimental results show that the proposed scheme accelerates a traditional fully-fault-tolerant processor by 1.7 times, while it reduces the soft error rate to 18% of that of a non-fault-tolerant processor.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"214 1","pages":"333-340"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79526525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A low overhead hardware technique for software integrity and confidentiality 一种低开销的硬件技术，可以保证软件的完整性和保密性

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601889

Austin Rogers, M. Milenkovic, A. Milenković

引用次数: 8

Why we need statistical static timing analysis 为什么我们需要统计静态时序分析

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601885

C. Forzan, D. Pandini

引用次数: 17

CMOS logic design with independent-gate FinFETs 独立栅极finfet的CMOS逻辑设计

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601953

Anish Muttreja, Niket Agarwal, N. Jha

引用次数: 150

Maximizing the throughput-area efficiency of fully-parallel low-density parity-check decoding with C-slow retiming and asynchronous deep pipelining 最大化全并行低密度奇偶校验解码的吞吐量面积效率与C-slow重定时和异步深管道

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601964

Ming Su, Lili Zhou, C. Shi

引用次数: 10