2010 IEEE International Conference on Computer Design最新文献

筛选
英文 中文
Out-of-order retirement of instructions in sequentially consistent multiprocessors 顺序一致多处理器中指令的乱序退役
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647558
R. Ubal, J. Sahuquillo, S. Petit, P. López, D. Kaeli
{"title":"Out-of-order retirement of instructions in sequentially consistent multiprocessors","authors":"R. Ubal, J. Sahuquillo, S. Petit, P. López, D. Kaeli","doi":"10.1109/ICCD.2010.5647558","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647558","url":null,"abstract":"Out-of-order retirement of instructions has been shown to be an effective technique to increase the number of in-flight instructions. This form of runtime scheduling can reduce pipeline stalls caused by head-of-line blocking effects in the reorder buffer (ROB). Wide instruction windows are very beneficial to multiprocessors that implement a strict memory model, especially when both loads and stores encounter long latencies due to cache misses, and whose stalls must be overlapped with instruction execution to overcome the memory gap. In this paper, the Validation Buffer (VB) multiprocessor architecture is proposed as a cost-effective, checkpoint-free, scalable approach to retire instructions out of program order, while still enforcing sequential consistency, and without impacting the memory hierarchy or interconnect. Experimental results show that utilizing the Validation Buffer can speed up both release and sequentially consistent in-order retirement in future multiprocessor systems by between 3% and 20%, depending on the ROB size.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115328482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Toward reliable SRAM-based device identification 迈向可靠的基于sram的设备识别
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647724
Joonsoo Kim, Joonsoo Lee, J. Abraham
{"title":"Toward reliable SRAM-based device identification","authors":"Joonsoo Kim, Joonsoo Lee, J. Abraham","doi":"10.1109/ICCD.2010.5647724","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647724","url":null,"abstract":"Due to process variation, power-up values of embedded SRAM memory are unique for individual devices. They are used as SRAM fingerprints to identify integrated-circuits which is fundamental for security applications. The fingerprints, however, are sensitive to environmental changes. Consequently, during the identification process, errors may occur. To overcome this inherent nondeterminism, we provide a systematic approach to designing reliable SRAM-based identification system. We also discuss how to evaluate its system performance. We present a generic score-fusion-based matching recipe to identify devices with high confidence across a wide range of environmental conditions.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117044750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A tag-based cache replacement 基于标记的缓存替换
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647602
Chuanjun Zhang, Bing Xue
{"title":"A tag-based cache replacement","authors":"Chuanjun Zhang, Bing Xue","doi":"10.1109/ICCD.2010.5647602","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647602","url":null,"abstract":"Conventional cache replacement policies use access information of each cache block for replacement decisions. We observe that there are many identical tags across different cache sets because programs exhibit spatial locality. The number of different tags in cache memory is significantly less than the total number of cache blocks in a cache. We propose a tag-based replacement that uses access frequency and recency of tags instead of cache blocks for the replacement decision. The tag-based replacement reduces the average miss rate of the baseline 1MB L2 cache by 15% over conventional LRU with 95% status bits reduction over conventional LRU. The performance improvement of a processor using the tag-based replacement is up to 40% with an average of 4.5% over LRU.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116321562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BDD-based circuit restructuring for reducing dynamic power 基于bdd的降低动态功率的电路重构
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647524
Quang Dinh, Deming Chen, Martin D. F. Wong
{"title":"BDD-based circuit restructuring for reducing dynamic power","authors":"Quang Dinh, Deming Chen, Martin D. F. Wong","doi":"10.1109/ICCD.2010.5647524","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647524","url":null,"abstract":"As advances in process technology continue to scale down transistors, low power design is becoming more critical. Clock gating is a dynamic power saving technique that can freeze some flip-flops and prevent portion of the circuit from unneeded switching. In this paper, we consider fine-grained clock gating through pipelining, in which control signals from one pipeline stage are used to freeze some logic in the next pipeline stage. We present a novel BDD-based decomposition algorithm to restructure the circuit and expose possible control signals that would maximize power saving. We then use ILP formulation to select the optimal set of control signals for the circuit. We show that the constraint matrix is totally unimodular, and solve this selection problem optimally using linear programming. Comparing to a previous work [7], we get similar and 9% better dynamic power saving for small and medium circuits, respectively. For the largest MCNC circuits, which the previous technique cannot handle, we get an average of 19% dynamic power saving with 9.3% area overhead comparing to the original, non-restructured circuits.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124803025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Temperature-to-power mapping Temperature-to-power映射
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647690
Zhenyu Qi, B. Meyer, Wei Huang, R. J. Ribando, K. Skadron, M. Stan
{"title":"Temperature-to-power mapping","authors":"Zhenyu Qi, B. Meyer, Wei Huang, R. J. Ribando, K. Skadron, M. Stan","doi":"10.1109/ICCD.2010.5647690","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647690","url":null,"abstract":"Accurate power maps are useful for power model validation, process variation characterization, leakage estimation, and power optimization, but are hard to measure directly. Deriving power maps from measured thermal maps is the inverse problem of the power-to-temperature mapping, extensively studied through thermal simulation. Until recently this inverse heat conduction problem has received little attention in the microarchitecture research community. This paper first identifies the source of difficulties for the problem. The inverse mapping is then performed by applying constraints from microarchitecture-level observations. The inherent large sensitivity of the resultant power map is minimized through thermal map-filtering and constrained least-squares optimization. Choices of filter parameters and optimization constraints are investigated and their effects are evaluated. Furthermore, the paper highlights the differences between the grid and block modeling in the inverse mapping which were often ignored by previous schemes. The proposed methods reduce the mapping error by more than 10× compared to unoptimized solutions. To our best knowledge this is the first work to quantitatively evaluate and minimize the noise effect in the temperature to power mapping problem at the microarchitecture level for both grid and block mode, and for the steady and transient case.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128915424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Microarchitecture aware gate sizing: A framework for circuit-architecture co-optimization 微体系结构感知栅极尺寸:电路体系结构协同优化的框架
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647775
Sanghamitra Roy, Koushik Chakraborty
{"title":"Microarchitecture aware gate sizing: A framework for circuit-architecture co-optimization","authors":"Sanghamitra Roy, Koushik Chakraborty","doi":"10.1109/ICCD.2010.5647775","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647775","url":null,"abstract":"Modern high performance microprocessors experience substantially lower utilization in many of their structural components. To recover energy efficiency from lower utilization, system architects resort to dynamic voltage frequency scaling (DVFS). In this paper, we demonstrate that dynamic adaptations using DVFS are markedly energy inefficient than techniques that design circuits ground up for lower performance. We propose a novel microarchitecture aware gate sizing and threshold voltage assignment algorithm to mitigate this current limitation. Our technique is the first of its kind that exploits architectural slack in gate sizing, and leverages on-chip redundancy and slack. We evaluate this circuit-architectural co-optimization framework in a superscalar processor by combining standard cell based gate sizing flows with state-of-the-art architectural simulation. Our results show 17–46% improvement in the datapath energy efficiency over traditional circuit designs incorporating DVFS schemes.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129204312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC 通过联合优化单元大小、冗余和ECC,最大限度地减少低压SRAM阵列的总面积
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647605
Shiyu Zhou, S. Katariya, H. Ghasemi, S. Draper, N. Kim
{"title":"Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC","authors":"Shiyu Zhou, S. Katariya, H. Ghasemi, S. Draper, N. Kim","doi":"10.1109/ICCD.2010.5647605","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647605","url":null,"abstract":"The increasing power consumption of processors has made power reduction a first-order priority in their design. Voltage scaling is one of the most successful power-reduction techniques introduced to date, but it is limited to some minimum voltage, VDDMIN, below which all components cannot operate reliably. In particular, ever-increasing process variability due to shrinking feature size further degrades the low-voltage reliability of, e.g., SRAM cells. Larger SRAM cells are less sensitive to process variability and their use would allow a reduction in VDDMIN. However, large-scale memory structures, e.g., last-level caches (LLCs) that often determine the VDDMIN of processors, cannot afford to use such large SRAM cells due to the resulting increase in die area. In this paper we propose a joint optimization of LLC cell size, number of redundant cells, and ECC (error-correction coding) strength to minimize total SRAM area while meeting target yields and VDDMIN. The use of redundant cells and ECC enable the use of smaller cell sizes while maintaining target yields and VDDMIN. Smaller cell sizes more than make up for the extra cells required by redundancy and ECC. We first assess each approach individually, i.e., only redundancy or ECC for various cell sizes. We then consider a combined approach and observe significant improvements. For example, in 32nm technology our combined approach yields a 27% reduction in total SRAM area (including redundant cells) when targeting a VDDMIN of 600mV.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126836084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
DSS: Applying asynchronous techniques to architectures exploiting ILP at compile time DSS:将异步技术应用于在编译时利用ILP的体系结构
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647721
Wei Shi, Zhiying Wang, Hongguang Ren, Ting Cao, Wei Chen, Bo Su, Hongyi Lu
{"title":"DSS: Applying asynchronous techniques to architectures exploiting ILP at compile time","authors":"Wei Shi, Zhiying Wang, Hongguang Ren, Ting Cao, Wei Chen, Bo Su, Hongyi Lu","doi":"10.1109/ICCD.2010.5647721","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647721","url":null,"abstract":"Embedded application environments require both high performance and low power. Architectures exploiting instruction-level parallelism (ILP) at compile time, such as very long instruction word (VLIW) and transport triggered architecture (TTA), may satisfy the requirements. They can be further enhanced by using asynchronous circuits to significantly reduce power consumption. As such, we are interested in asynchronous processors with architectures exploiting ILP at compile time. However, most of the current asynchronous processors are based on RISC-like architectures. When designing asynchronous VLIW or TTA processors, the distribution of control introduces some serious problems, and errors may occur because of the variable latencies of operations. This paper investigates the asynchronous processor with architecture exploiting ILP at compile time. In order to overcome these problems, we propose a data source selecting (DSS) scheme to guarantee instructions run correctly on asynchronous VLIW and TTA processors. Concretely, an asynchronous pipelined processor based on TTA is designed. The micro-architecture of the proposed asynchronous TTA processor is presented and an asynchronous processor named Tengyue is implemented using 180nm technology. The experimental results, for a range of benchmarks and working modes, show that the implemented asynchronous TTA processor with DSS scheme support runs correctly and power dissipation is reduced to about 43% to 65% of the equivalent synchronous processor.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115820445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A radix-10 digit recurrence division unit with a constant digit selection function 一个基数-10位的递归除法单元,具有常数位数选择功能
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647764
Malte Baesler, Sven-Ole Voigt, T. Teufel
{"title":"A radix-10 digit recurrence division unit with a constant digit selection function","authors":"Malte Baesler, Sven-Ole Voigt, T. Teufel","doi":"10.1109/ICCD.2010.5647764","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647764","url":null,"abstract":"Decimal floating point operations are important for applications that cannot tolerate errors from conversions between binary and decimal formats, for instance, scientific, commercial, and financial applications. In this work we present a radix-10 digit recurrence division algorithm that decomposes the quotient digits into three parts and requires only the computation of five and two times the divisor. Moreover, the divisor's multiples are selected without multiplexers and the digit selection functions are independent of the divisor's value and do not require a lookup table. The algorithm has been synthesized and verified on a Xilinx Virtex-5 FPGA and implementation results are given.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131999314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Elaboration-time synthesis of high-level language constructs in SystemC-based microarchitectural simulators 基于systemc的微体系结构模拟器中高级语言结构的精化时间合成
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647583
Zhuo Ruan, Kurtis Cahill, D. Penry
{"title":"Elaboration-time synthesis of high-level language constructs in SystemC-based microarchitectural simulators","authors":"Zhuo Ruan, Kurtis Cahill, D. Penry","doi":"10.1109/ICCD.2010.5647583","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647583","url":null,"abstract":"Structural modeling serves as an efficient method for creating detailed microarchitectural models of complex microprocessors. High-level language constructs such as templates and object polymorphism are used to achieve a high degree of code reuse, thereby reducing development time. However, these modeling frameworks are currently too slow to evaluate future design of multicore microprocessors. The synthesis of portions of these models into hardware to form hybrid simulators promises to improve their speed substantially. Unfortunately, the high-level language constructs used in structural simulation frameworks are not typically synthesizable. One factor which limits their synthesis is that it is very difficult to determine statically what exactly the code and data to synthesize are. We propose an elaboration-time synthesis method for SystemC-based microarchitectural simulators. As part of the runtime environment of our infrastructure, the synthesis tool extracts architectural information after elaboration, binds dynamic information to a low-level intermediate representation (IR), and synthesizes the IR to VHDL. We show that this approach permits the synthesis of high-level language constructs which could not be easily synthesized before.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124949048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信