2010 IEEE International Conference on Computer Design最新文献_第2页

Out-of-order retirement of instructions in sequentially consistent multiprocessors 顺序一致多处理器中指令的乱序退役

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647558

R. Ubal, J. Sahuquillo, S. Petit, P. López, D. Kaeli

引用次数: 2

Toward reliable SRAM-based device identification 迈向可靠的基于sram的设备识别

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647724

Joonsoo Kim, Joonsoo Lee, J. Abraham

引用次数: 9

A tag-based cache replacement 基于标记的缓存替换

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647602

Chuanjun Zhang, Bing Xue

引用次数: 2

BDD-based circuit restructuring for reducing dynamic power 基于bdd的降低动态功率的电路重构

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647524

Quang Dinh, Deming Chen, Martin D. F. Wong

引用次数: 6

Temperature-to-power mapping Temperature-to-power映射

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647690

Zhenyu Qi, B. Meyer, Wei Huang, R. J. Ribando, K. Skadron, M. Stan

{"title":"Temperature-to-power mapping","authors":"Zhenyu Qi, B. Meyer, Wei Huang, R. J. Ribando, K. Skadron, M. Stan","doi":"10.1109/ICCD.2010.5647690","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647690","url":null,"abstract":"Accurate power maps are useful for power model validation, process variation characterization, leakage estimation, and power optimization, but are hard to measure directly. Deriving power maps from measured thermal maps is the inverse problem of the power-to-temperature mapping, extensively studied through thermal simulation. Until recently this inverse heat conduction problem has received little attention in the microarchitecture research community. This paper first identifies the source of difficulties for the problem. The inverse mapping is then performed by applying constraints from microarchitecture-level observations. The inherent large sensitivity of the resultant power map is minimized through thermal map-filtering and constrained least-squares optimization. Choices of filter parameters and optimization constraints are investigated and their effects are evaluated. Furthermore, the paper highlights the differences between the grid and block modeling in the inverse mapping which were often ignored by previous schemes. The proposed methods reduce the mapping error by more than 10× compared to unoptimized solutions. To our best knowledge this is the first work to quantitatively evaluate and minimize the noise effect in the temperature to power mapping problem at the microarchitecture level for both grid and block mode, and for the steady and transient case.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128915424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Microarchitecture aware gate sizing: A framework for circuit-architecture co-optimization 微体系结构感知栅极尺寸:电路体系结构协同优化的框架

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647775

Sanghamitra Roy, Koushik Chakraborty

引用次数: 2

Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC 通过联合优化单元大小、冗余和ECC，最大限度地减少低压SRAM阵列的总面积

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647605

Shiyu Zhou, S. Katariya, H. Ghasemi, S. Draper, N. Kim

{"title":"Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC","authors":"Shiyu Zhou, S. Katariya, H. Ghasemi, S. Draper, N. Kim","doi":"10.1109/ICCD.2010.5647605","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647605","url":null,"abstract":"The increasing power consumption of processors has made power reduction a first-order priority in their design. Voltage scaling is one of the most successful power-reduction techniques introduced to date, but it is limited to some minimum voltage, VDDMIN, below which all components cannot operate reliably. In particular, ever-increasing process variability due to shrinking feature size further degrades the low-voltage reliability of, e.g., SRAM cells. Larger SRAM cells are less sensitive to process variability and their use would allow a reduction in VDDMIN. However, large-scale memory structures, e.g., last-level caches (LLCs) that often determine the VDDMIN of processors, cannot afford to use such large SRAM cells due to the resulting increase in die area. In this paper we propose a joint optimization of LLC cell size, number of redundant cells, and ECC (error-correction coding) strength to minimize total SRAM area while meeting target yields and VDDMIN. The use of redundant cells and ECC enable the use of smaller cell sizes while maintaining target yields and VDDMIN. Smaller cell sizes more than make up for the extra cells required by redundancy and ECC. We first assess each approach individually, i.e., only redundancy or ECC for various cell sizes. We then consider a combined approach and observe significant improvements. For example, in 32nm technology our combined approach yields a 27% reduction in total SRAM area (including redundant cells) when targeting a VDDMIN of 600mV.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126836084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

DSS: Applying asynchronous techniques to architectures exploiting ILP at compile time DSS:将异步技术应用于在编译时利用ILP的体系结构

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647721

Wei Shi, Zhiying Wang, Hongguang Ren, Ting Cao, Wei Chen, Bo Su, Hongyi Lu

{"title":"DSS: Applying asynchronous techniques to architectures exploiting ILP at compile time","authors":"Wei Shi, Zhiying Wang, Hongguang Ren, Ting Cao, Wei Chen, Bo Su, Hongyi Lu","doi":"10.1109/ICCD.2010.5647721","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647721","url":null,"abstract":"Embedded application environments require both high performance and low power. Architectures exploiting instruction-level parallelism (ILP) at compile time, such as very long instruction word (VLIW) and transport triggered architecture (TTA), may satisfy the requirements. They can be further enhanced by using asynchronous circuits to significantly reduce power consumption. As such, we are interested in asynchronous processors with architectures exploiting ILP at compile time. However, most of the current asynchronous processors are based on RISC-like architectures. When designing asynchronous VLIW or TTA processors, the distribution of control introduces some serious problems, and errors may occur because of the variable latencies of operations. This paper investigates the asynchronous processor with architecture exploiting ILP at compile time. In order to overcome these problems, we propose a data source selecting (DSS) scheme to guarantee instructions run correctly on asynchronous VLIW and TTA processors. Concretely, an asynchronous pipelined processor based on TTA is designed. The micro-architecture of the proposed asynchronous TTA processor is presented and an asynchronous processor named Tengyue is implemented using 180nm technology. The experimental results, for a range of benchmarks and working modes, show that the implemented asynchronous TTA processor with DSS scheme support runs correctly and power dissipation is reduced to about 43% to 65% of the equivalent synchronous processor.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115820445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A radix-10 digit recurrence division unit with a constant digit selection function 一个基数-10位的递归除法单元，具有常数位数选择功能

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647764

Malte Baesler, Sven-Ole Voigt, T. Teufel

引用次数: 4

Elaboration-time synthesis of high-level language constructs in SystemC-based microarchitectural simulators 基于systemc的微体系结构模拟器中高级语言结构的精化时间合成

2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647583

Zhuo Ruan, Kurtis Cahill, D. Penry

{"title":"Elaboration-time synthesis of high-level language constructs in SystemC-based microarchitectural simulators","authors":"Zhuo Ruan, Kurtis Cahill, D. Penry","doi":"10.1109/ICCD.2010.5647583","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647583","url":null,"abstract":"Structural modeling serves as an efficient method for creating detailed microarchitectural models of complex microprocessors. High-level language constructs such as templates and object polymorphism are used to achieve a high degree of code reuse, thereby reducing development time. However, these modeling frameworks are currently too slow to evaluate future design of multicore microprocessors. The synthesis of portions of these models into hardware to form hybrid simulators promises to improve their speed substantially. Unfortunately, the high-level language constructs used in structural simulation frameworks are not typically synthesizable. One factor which limits their synthesis is that it is very difficult to determine statically what exactly the code and data to synthesize are. We propose an elaboration-time synthesis method for SystemC-based microarchitectural simulators. As part of the runtime environment of our infrastructure, the synthesis tool extracts architectural information after elaboration, binds dynamic information to a low-level intermediate representation (IR), and synthesizes the IR to VHDL. We show that this approach permits the synthesis of high-level language constructs which could not be easily synthesized before.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124949048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2