2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献_第2页

DR-SNUCA: An energy-scalable dynamically partitioned cache DR-SNUCA:能量可伸缩的动态分区缓存

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657096

Anshuman Gupta, J. Sampson, M. Taylor

引用次数: 3

Stochastic functions using sequential logic 随机函数使用顺序逻辑

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657094

N. Saraf, K. Bazargan, D. Lilja, Marc D. Riedel

引用次数: 11

Functional Fmax test-time reduction using novel DFTs for circuit initialization 利用新颖dft减少电路初始化的功能Fmax测试时间

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657017

Ujjwal Guin, T. Chakraborty, M. Tehranipoor

引用次数: 4

Assessing the impact of hard faults in performance components of modern microprocessors 评估硬故障对现代微处理器性能组件的影响

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657044

N. Foutris, D. Gizopoulos, J. Kalamatianos, Vilas Sridharan

{"title":"Assessing the impact of hard faults in performance components of modern microprocessors","authors":"N. Foutris, D. Gizopoulos, J. Kalamatianos, Vilas Sridharan","doi":"10.1109/ICCD.2013.6657044","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657044","url":null,"abstract":"A growing portion of the silicon area of modern high-performance microprocessors is dedicated to components that increase performance but do not determine functional correctness. Permanent hardware faults in these components can lead to performance fluctuation (not necessarily degradation) and do not produce functional errors. Although this fact has been identified previously, extensive research has not yet been conducted to accurately classify and quantify permanent faults in these components over a set of CPU benchmarks or measure the magnitude of the performance impact. Depending on the results of such studies, performance-related components of microprocessors can be disabled in fine or coarse granularities, salvaging microprocessor functionality at different performance levels. This paper analyzes the impact of permanent faults in the arrays and control logic of key microprocessor performance components such as the branch predictor, branch target buffer, return address stack, and data and instruction prefetchers. We apply a statistically safe fault injection campaign for single faults in performance components on a modified version of the cycle-accurate x86 architectural simulator PTLsim running the SPEC CPU2006 suite. Our evaluation reveals significant differences in the effect of faults and their performance impacts across the components as well as within each component (different fields). We classify faults for all components and analyze their IPC impact in the arrays and control logic. Our analysis shows that a very large fraction (44% to 96%) of permanent faults in these components leads only to performance fluctuation. Observation confirms the intuition that there are no functionality errors; however, many cases of a single fault in a performance component can significantly degrade microprocessor performance (2-20%average IPC reduction for SPEC CPU2006).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129854644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Integrating thermocouple sensors into 3D ICs 集成热电偶传感器到3D集成电路

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657046

Dawei Li, Ji-hoon Kim, S. Memik

引用次数: 9

Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation MLC NAND闪存中的程序干扰:表征、建模和缓解

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657034

Yu Cai, O. Mutlu, E. Haratsch, K. Mai

{"title":"Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation","authors":"Yu Cai, O. Mutlu, E. Haratsch, K. Mai","doi":"10.1109/ICCD.2013.6657034","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657034","url":null,"abstract":"As NAND flash memory continues to scale down to smaller process technology nodes, its reliability and endurance are degrading. One important source of reduced reliability is the phenomenon of program interference: when a flash cell is programmed to a value, the programming operation affects the threshold voltage of not only that cell, but also the other cells surrounding it. This interference potentially causes a surrounding cell to move to a logical state (i.e., a threshold voltage range) that is different from its original state, leading to an error when the cell is read. Understanding, characterizing, and modeling of program interference, i.e., how much the threshold voltage of a cell shifts when another cell is programmed, can enable the design of mechanisms that can effectively and efficiently predict and/or tolerate such errors. In this paper, we provide the first experimental characterization of and a realistic model for program interference in modern MLC NAND flash memory. To this end, we utilize the read-retry mechanism present in some state-of-the-art 2Y-nm (i.e., 20-24nm) flash chips to measure the changes in threshold voltage distributions of cells when a particular cell is programmed. Our results show that the amount of program interference received by a cell depends on 1) the location of the programmed cells, 2) the order in which cells are programmed, and 3) the data values of the cell that is being programmed as well as the cells surrounding it. Based on our experimental characterization, we develop a new model that predicts the amount of program interference as a function of threshold voltage values and changes in neighboring cells. We devise and evaluate one application of this model that adjusts the read reference voltage to the predicted threshold voltage distribution with the goal of minimizing erroneous reads. Our analysis shows that this new technique can reduce the raw flash bit error rate by 64% and thereby improve flash lifetime by 30%. We hope that the understanding and models developed in this paper lead to other error tolerance mechanisms for future flash memories.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 196

A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages 一个私有的1级缓存架构，利用在接近阈值电压下工作的多核中的延迟和容量权衡

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657029

Farrukh Hijaz, Qingchuan Shi, O. Khan

{"title":"A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages","authors":"Farrukh Hijaz, Qingchuan Shi, O. Khan","doi":"10.1109/ICCD.2013.6657029","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657029","url":null,"abstract":"Near-threshold voltage (NTV) operation is expected to enable up to 10× energy-efficiency for future processors. However, reliable operation below a minimum voltage (Vccmin) cannot be guaranteed. Specifically, SRAM bit-cell error rates are expected to rise steeply since their margins can easily be violated at near-threshold voltages. Multicore processors rely on fast private L1 caches to exploit data locality and achieve high performance. In the presence of high bit-cell error rates, an L1 cache can either sacrifice capacity or incur additional latency to correct the errors. We observe that L1 cache sensitivity to hit latency offers a design tradeoff between capacity and latency. When error rate is high at extreme Vccmin, it is worthwhile incurring additional latency to recover and utilize the additional L1 cache capacity. However, at low error rates, the additional constant latency to recover cache capacity degrades performance. With this tradeoff in mind, we propose a novel private L1 cache architecture that dynamically learns and adapts by either recovering cache capacity at the cost of additional latency overhead, or operate at lower capacity while utilizing the benefits of optimal hit latency. Using simulations of a 64-core multicore, we demonstrate that our adaptive L1 cache architecture performs better than both individual schemes at low and high error rates (i.e., various NTV conditions).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116612050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Efficient floating-point representation for balanced codes for FPGA devices FPGA平衡码的高效浮点表示

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657053

J. Villalba, J. Hormigo, F. Corbera, Mario A. González, E. Zapata

引用次数: 5

A low-jitter phase-locked resonant clock generation and distribution scheme 一种低抖动锁相谐振时钟的产生和分配方案

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657089

Ayan Mandal, Kalyana C. Bollapalli, N. Jayakumar, S. Khatri, R. Mahapatra

{"title":"A low-jitter phase-locked resonant clock generation and distribution scheme","authors":"Ayan Mandal, Kalyana C. Bollapalli, N. Jayakumar, S. Khatri, R. Mahapatra","doi":"10.1109/ICCD.2013.6657089","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657089","url":null,"abstract":"Clock distribution networks have traditionally been optimized to minimize end-to-end delay of the distribution network. However, since most digital ICs have an on-chip PLL, a more relevant design goal is to minimize cycle-to-cycle jitter. In this paper, we present a novel low-jitter phase-locked clock generation and distribution methodology which uses resonant standing wave oscillators (SWOs). In contrast to traveling wave oscillator rings (TWOs or “rotary” clocks), our SWO achieves the same phase at every point in the ring, making it amenable to a synchronous design methodology. The standing wave oscillator is controlled by coarse as well as fine tuning. Coarse tuning is achieved by varying the ring inductance, while fine tuning is accomplished by varying the ring capacitance. Clock distribution is done by routing the resonant ring chip-wide in a “comb” like manner. Experimental results demonstrate that the cycle-to-cycle jitter and skew of our approach is dramatically lower than existing schemes, while the power consumption is significantly lower as well. These benefits occur due to the resonant nature of our SWO-based clock generation and distribution approach.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130378754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Memory-centric accelerator design for Convolutional Neural Networks 卷积神经网络以内存为中心的加速器设计

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657019

Maurice Peemen, A. Setio, B. Mesman, H. Corporaal

{"title":"Memory-centric accelerator design for Convolutional Neural Networks","authors":"Maurice Peemen, A. Setio, B. Mesman, H. Corporaal","doi":"10.1109/ICCD.2013.6657019","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657019","url":null,"abstract":"In the near future, cameras will be used everywhere as flexible sensors for numerous applications. For mobility and privacy reasons, the required image processing should be local on embedded computer platforms with performance requirements and energy constraints. Dedicated acceleration of Convolutional Neural Networks (CNN) can achieve these targets with enough flexibility to perform multiple vision tasks. A challenging problem for the design of efficient accelerators is the limited amount of external memory bandwidth. We show that the effects of the memory bottleneck can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload. The efficiency of the on-chip memories is maximized by our scheduler that uses tiling to optimize for data locality. Our design flow ensures that on-chip memory size is minimized, which reduces area and energy usage. The design flow is evaluated by a High Level Synthesis implementation on a Virtex 6 FPGA board. Compared to accelerators with standard scratchpad memories the FPGA resources can be reduced up to 13× while maintaining the same performance. Alternatively, when the same amount of FPGA resources is used our accelerators are up to 11× faster.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134388105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 274