2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
DR-SNUCA: An energy-scalable dynamically partitioned cache DR-SNUCA:能量可伸缩的动态分区缓存
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657096
Anshuman Gupta, J. Sampson, M. Taylor
{"title":"DR-SNUCA: An energy-scalable dynamically partitioned cache","authors":"Anshuman Gupta, J. Sampson, M. Taylor","doi":"10.1109/ICCD.2013.6657096","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657096","url":null,"abstract":"Multicore processors have become ubiquitous across many domains, such as datacenters and smartphones. As the number of processing elements increases within these processors, so does the pressure to share the critical on-chip cache resources, but this must be done energy-efficiently and without sacrificing resource guarantees. We propose a scalable dynamic cache-partitioning scheme, DR-SNUCA, which provides an energy-efficient way to reduce resource interference over caches shared among many processing elements. Our results show that DR-SNUCA reduces system energy consumption by 16.3% compared to associatively partitioned caches, such as DNUCA.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114251116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Stochastic functions using sequential logic 随机函数使用顺序逻辑
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657094
N. Saraf, K. Bazargan, D. Lilja, Marc D. Riedel
{"title":"Stochastic functions using sequential logic","authors":"N. Saraf, K. Bazargan, D. Lilja, Marc D. Riedel","doi":"10.1109/ICCD.2013.6657094","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657094","url":null,"abstract":"Stochastic computing is a novel approach to real arithmetic, offering better error tolerance and lower hardware costs over the conventional implementations. Stochastic modules are digital systems that process random bit streams representing real values in the unit interval. Stochastic modules based on finite state machines (FSMs) have been shown to realize complicated arithmetic functions much more efficiently than combinational stochastic modules. However, a general approach to synthesize FSMs for realizing arbitrary functions has been elusive. We describe a systematic procedure to design FSMs that implement arbitrary real-valued functions in the unit interval using the Taylor series approximation.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"271 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116081578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Functional Fmax test-time reduction using novel DFTs for circuit initialization 利用新颖dft减少电路初始化的功能Fmax测试时间
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657017
Ujjwal Guin, T. Chakraborty, M. Tehranipoor
{"title":"Functional Fmax test-time reduction using novel DFTs for circuit initialization","authors":"Ujjwal Guin, T. Chakraborty, M. Tehranipoor","doi":"10.1109/ICCD.2013.6657017","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657017","url":null,"abstract":"Using functional test for Fmax analysis is still the only effective method used in practice in spite of the fact that the test cost associated with functional Fmax test remains to be a major problem. In this paper, we develop novel design-for-testability (DFT) structures to considerably reduce the cost of initializing the circuit during functional test. The proposed architectures take advantage of existing DFT structures to reduce the overall cost of hardware and have no impact on the circuit timing. Our implementations of these DFT structures for initializing ITC'99 benchmark circuit b19 demonstrate the effectiveness of these techniques in reducing test time and thus the overall test cost.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125557244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Assessing the impact of hard faults in performance components of modern microprocessors 评估硬故障对现代微处理器性能组件的影响
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657044
N. Foutris, D. Gizopoulos, J. Kalamatianos, Vilas Sridharan
{"title":"Assessing the impact of hard faults in performance components of modern microprocessors","authors":"N. Foutris, D. Gizopoulos, J. Kalamatianos, Vilas Sridharan","doi":"10.1109/ICCD.2013.6657044","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657044","url":null,"abstract":"A growing portion of the silicon area of modern high-performance microprocessors is dedicated to components that increase performance but do not determine functional correctness. Permanent hardware faults in these components can lead to performance fluctuation (not necessarily degradation) and do not produce functional errors. Although this fact has been identified previously, extensive research has not yet been conducted to accurately classify and quantify permanent faults in these components over a set of CPU benchmarks or measure the magnitude of the performance impact. Depending on the results of such studies, performance-related components of microprocessors can be disabled in fine or coarse granularities, salvaging microprocessor functionality at different performance levels. This paper analyzes the impact of permanent faults in the arrays and control logic of key microprocessor performance components such as the branch predictor, branch target buffer, return address stack, and data and instruction prefetchers. We apply a statistically safe fault injection campaign for single faults in performance components on a modified version of the cycle-accurate x86 architectural simulator PTLsim running the SPEC CPU2006 suite. Our evaluation reveals significant differences in the effect of faults and their performance impacts across the components as well as within each component (different fields). We classify faults for all components and analyze their IPC impact in the arrays and control logic. Our analysis shows that a very large fraction (44% to 96%) of permanent faults in these components leads only to performance fluctuation. Observation confirms the intuition that there are no functionality errors; however, many cases of a single fault in a performance component can significantly degrade microprocessor performance (2-20%average IPC reduction for SPEC CPU2006).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129854644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Integrating thermocouple sensors into 3D ICs 集成热电偶传感器到3D集成电路
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657046
Dawei Li, Ji-hoon Kim, S. Memik
{"title":"Integrating thermocouple sensors into 3D ICs","authors":"Dawei Li, Ji-hoon Kim, S. Memik","doi":"10.1109/ICCD.2013.6657046","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657046","url":null,"abstract":"In this paper, we present a novel architecture for embedding bi-metallic thermocouple based temperature sensors into 3D IC stacks. To the best of our knowledge this is the first work addressing this specific integration problem. Our architecture uses dedicated vias to thermally couple sensors in the metal layer with the hotspots to be monitored in the active layer throughout the multi-stack structures. We propose a low cost solution by leveraging a fraction of existing thermal TSVs for this purpose. Through thermal modeling and simulation using a state-of-the-art tool (FloTHERM), we demonstrate that we can achieve high accuracy (less than 1°C error) in temperature tracking while still maintaining the effectiveness of the thermal TSVs in heat management (conforming to a fixed peak temperature threshold of 95°C).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127965180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation MLC NAND闪存中的程序干扰:表征、建模和缓解
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657034
Yu Cai, O. Mutlu, E. Haratsch, K. Mai
{"title":"Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation","authors":"Yu Cai, O. Mutlu, E. Haratsch, K. Mai","doi":"10.1109/ICCD.2013.6657034","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657034","url":null,"abstract":"As NAND flash memory continues to scale down to smaller process technology nodes, its reliability and endurance are degrading. One important source of reduced reliability is the phenomenon of program interference: when a flash cell is programmed to a value, the programming operation affects the threshold voltage of not only that cell, but also the other cells surrounding it. This interference potentially causes a surrounding cell to move to a logical state (i.e., a threshold voltage range) that is different from its original state, leading to an error when the cell is read. Understanding, characterizing, and modeling of program interference, i.e., how much the threshold voltage of a cell shifts when another cell is programmed, can enable the design of mechanisms that can effectively and efficiently predict and/or tolerate such errors. In this paper, we provide the first experimental characterization of and a realistic model for program interference in modern MLC NAND flash memory. To this end, we utilize the read-retry mechanism present in some state-of-the-art 2Y-nm (i.e., 20-24nm) flash chips to measure the changes in threshold voltage distributions of cells when a particular cell is programmed. Our results show that the amount of program interference received by a cell depends on 1) the location of the programmed cells, 2) the order in which cells are programmed, and 3) the data values of the cell that is being programmed as well as the cells surrounding it. Based on our experimental characterization, we develop a new model that predicts the amount of program interference as a function of threshold voltage values and changes in neighboring cells. We devise and evaluate one application of this model that adjusts the read reference voltage to the predicted threshold voltage distribution with the goal of minimizing erroneous reads. Our analysis shows that this new technique can reduce the raw flash bit error rate by 64% and thereby improve flash lifetime by 30%. We hope that the understanding and models developed in this paper lead to other error tolerance mechanisms for future flash memories.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 196
A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages 一个私有的1级缓存架构,利用在接近阈值电压下工作的多核中的延迟和容量权衡
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657029
Farrukh Hijaz, Qingchuan Shi, O. Khan
{"title":"A private level-1 cache architecture to exploit the latency and capacity tradeoffs in multicores operating at near-threshold voltages","authors":"Farrukh Hijaz, Qingchuan Shi, O. Khan","doi":"10.1109/ICCD.2013.6657029","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657029","url":null,"abstract":"Near-threshold voltage (NTV) operation is expected to enable up to 10× energy-efficiency for future processors. However, reliable operation below a minimum voltage (Vccmin) cannot be guaranteed. Specifically, SRAM bit-cell error rates are expected to rise steeply since their margins can easily be violated at near-threshold voltages. Multicore processors rely on fast private L1 caches to exploit data locality and achieve high performance. In the presence of high bit-cell error rates, an L1 cache can either sacrifice capacity or incur additional latency to correct the errors. We observe that L1 cache sensitivity to hit latency offers a design tradeoff between capacity and latency. When error rate is high at extreme Vccmin, it is worthwhile incurring additional latency to recover and utilize the additional L1 cache capacity. However, at low error rates, the additional constant latency to recover cache capacity degrades performance. With this tradeoff in mind, we propose a novel private L1 cache architecture that dynamically learns and adapts by either recovering cache capacity at the cost of additional latency overhead, or operate at lower capacity while utilizing the benefits of optimal hit latency. Using simulations of a 64-core multicore, we demonstrate that our adaptive L1 cache architecture performs better than both individual schemes at low and high error rates (i.e., various NTV conditions).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116612050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Efficient floating-point representation for balanced codes for FPGA devices FPGA平衡码的高效浮点表示
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657053
J. Villalba, J. Hormigo, F. Corbera, Mario A. González, E. Zapata
{"title":"Efficient floating-point representation for balanced codes for FPGA devices","authors":"J. Villalba, J. Hormigo, F. Corbera, Mario A. González, E. Zapata","doi":"10.1109/ICCD.2013.6657053","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657053","url":null,"abstract":"We propose a floating-point representation to deal efficiently with arithmetic operations in codes with a balanced number of additions and multiplications for FPGA devices. The variable shift operation is very slow in these devices. We propose a format that reduces the variable shifter penalty. It is based on a radix-64 representation such that the number of the possible shifts is considerably reduced. Thus, the execution time of the floating-point addition is highly optimized when it is performed in an FPGA device, which compensates for the multiplication penalty when a high radix is used, as experimental results have shown. Consequently, the main problem of previous specific high-radix FPGA designs (no speedup for codes with a balanced number of multiplications and additions) is overcome with our proposal. The inherent architecture supporting the new format works with greater bit precision than the corresponding single precision (SP) IEEE-754 standard.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131397852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A low-jitter phase-locked resonant clock generation and distribution scheme 一种低抖动锁相谐振时钟的产生和分配方案
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657089
Ayan Mandal, Kalyana C. Bollapalli, N. Jayakumar, S. Khatri, R. Mahapatra
{"title":"A low-jitter phase-locked resonant clock generation and distribution scheme","authors":"Ayan Mandal, Kalyana C. Bollapalli, N. Jayakumar, S. Khatri, R. Mahapatra","doi":"10.1109/ICCD.2013.6657089","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657089","url":null,"abstract":"Clock distribution networks have traditionally been optimized to minimize end-to-end delay of the distribution network. However, since most digital ICs have an on-chip PLL, a more relevant design goal is to minimize cycle-to-cycle jitter. In this paper, we present a novel low-jitter phase-locked clock generation and distribution methodology which uses resonant standing wave oscillators (SWOs). In contrast to traveling wave oscillator rings (TWOs or “rotary” clocks), our SWO achieves the same phase at every point in the ring, making it amenable to a synchronous design methodology. The standing wave oscillator is controlled by coarse as well as fine tuning. Coarse tuning is achieved by varying the ring inductance, while fine tuning is accomplished by varying the ring capacitance. Clock distribution is done by routing the resonant ring chip-wide in a “comb” like manner. Experimental results demonstrate that the cycle-to-cycle jitter and skew of our approach is dramatically lower than existing schemes, while the power consumption is significantly lower as well. These benefits occur due to the resonant nature of our SWO-based clock generation and distribution approach.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130378754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Memory-centric accelerator design for Convolutional Neural Networks 卷积神经网络以内存为中心的加速器设计
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657019
Maurice Peemen, A. Setio, B. Mesman, H. Corporaal
{"title":"Memory-centric accelerator design for Convolutional Neural Networks","authors":"Maurice Peemen, A. Setio, B. Mesman, H. Corporaal","doi":"10.1109/ICCD.2013.6657019","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657019","url":null,"abstract":"In the near future, cameras will be used everywhere as flexible sensors for numerous applications. For mobility and privacy reasons, the required image processing should be local on embedded computer platforms with performance requirements and energy constraints. Dedicated acceleration of Convolutional Neural Networks (CNN) can achieve these targets with enough flexibility to perform multiple vision tasks. A challenging problem for the design of efficient accelerators is the limited amount of external memory bandwidth. We show that the effects of the memory bottleneck can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload. The efficiency of the on-chip memories is maximized by our scheduler that uses tiling to optimize for data locality. Our design flow ensures that on-chip memory size is minimized, which reduces area and energy usage. The design flow is evaluated by a High Level Synthesis implementation on a Virtex 6 FPGA board. Compared to accelerators with standard scratchpad memories the FPGA resources can be reduced up to 13× while maintaining the same performance. Alternatively, when the same amount of FPGA resources is used our accelerators are up to 11× faster.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134388105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 274
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信