{"title":"Enhancing 3T DRAMs for SRAM replacement under 10nm tri-gate SOI FinFETs","authors":"Z. Jaksic, R. Canal","doi":"10.1109/ICCD.2012.6378657","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378657","url":null,"abstract":"In this paper, we present the dynamic 3T memory cell for future 10nm tri-gate FinFETs as a potential replacement for classical 6T SRAM cell for implementation in high speed cache memories. We investigate read access time, retention time, and static power consumption of the cell when it is exposed to the effects of process and environmental variations. Process variations are extracted from the ITRS predictions and they are modeled at device level. For simulation, we use 10nm SOI tri-gate FinFET BSIM-CMG model card developed by the University of Glasgow, Device Modeling Group. When compared to the classical 6T SRAM, 3T cell has 40% smaller area, leakage is reduced up to 14 times while access time is approximately the same. In order to achieve higher retention times, we propose several cell extensions which, at the same time, enable post-fabrication/run-time adaptability.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126789034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel U. Becker, Nan Jiang, George Michelogiannakis, W. Dally
{"title":"Adaptive Backpressure: Efficient buffer management for on-chip networks","authors":"Daniel U. Becker, Nan Jiang, George Michelogiannakis, W. Dally","doi":"10.1109/ICCD.2012.6378673","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378673","url":null,"abstract":"This paper introduces Adaptive Backpressure, a novel scheme that improves the utilization of dynamically managed router input buffers by continuously adjusting the stiffness of the flow control feedback loop in response to observed traffic conditions. Through a simple extension to the router's flow control mechanism, the proposed scheme heuristically limits the number of credits available to individual virtual channels based on estimated downstream congestion, aiming to minimize the amount of buffer space that is occupied unproductively. This leads to more efficient distribution of buffer space and improves isolation between multiple concurrently executing workloads with differing performance characteristics. Experimental results for a 64-node mesh network show that Adaptive Backpressure improves network stability, leading to an average 2.6× increase in throughput under heavy load across traffic patterns. In the presence of background traffic, the proposed scheme reduces zero-load latency by an average of 31%. Finally, it mitigates the performance degradation encountered when latency- and throughput-optimized execution cores contend for network resources in a heterogeneous chip multi-processor; across a set of PARSEC benchmarks, we observe an average reduction in execution time of 34%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128155836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Kamal, Q. Xie, M. Pedram, A. Afzali-Kusha, S. Safari
{"title":"An efficient reliability simulation flow for evaluating the hot carrier injection effect in CMOS VLSI circuits","authors":"M. Kamal, Q. Xie, M. Pedram, A. Afzali-Kusha, S. Safari","doi":"10.1109/ICCD.2012.6378663","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378663","url":null,"abstract":"Hot carrier injection (HCI) effect is one of the major reliability concerns in VLSI circuits. This paper presents a scalable reliability simulation flow, including a logic cell characterization method and an efficient full chip simulation method, to analyze the HCI-induced transistor aging with a fast run time and high accuracy. The transistor-level HCI effect is modeled based on the Reaction-Diffusion (R-D) framework. The gate-level HCI impact characterization method combines HSpice simulation and piecewise linear curve fitting. The proposed characterization method reveals that the HCI effect on some transistors is much more significant than the others according to the logic cell structure. Additionally, during the circuit simulation, pertinent transitions are identified and all cells in the circuit are classified into two groups: critical and non-critical. The proposed method reduces the simulation time while maintaining high accuracy by applying fine granularity simulation time steps to the critical cells and coarse granularity ones to the non-critical cells in the circuit.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124267521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yoon Seok Yang, Reeshav Kumar, G. Choi, Paul V. Gratz
{"title":"WaveSync: A low-latency source synchronous bypass network-on-chip architecture","authors":"Yoon Seok Yang, Reeshav Kumar, G. Choi, Paul V. Gratz","doi":"10.1109/ICCD.2012.6378647","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378647","url":null,"abstract":"WaveSync is a low-latency focused, network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs. WaveSync facilitates low-latency communication leveraging the source-synchronous clock sent with the data, to time components in the downstream routers data-path to reduce the number of synchronizations needed. WaveSync accomplishes this by partitioning the router components at each node into different clock-domains, each synchronized with one of the the orthogonal incoming source synchronous clocks in a GALS 2D mesh network. The data and clock subsequently propagate through each node/router, synchronously, until the destination is reached, regardless of the number of hops it may take. As long as the data travel in the path of clock propagation, and no congestion is encountered, it will be propagated without latching, as if in a long-combinatorial path, with both the clock and the data accruing delay at the same rate. The result is that the need for synchronization between the mesochronous nodes and/or the asynchronous control associated with typical GALS network is completely eliminated. The proposed WaveSync network outperforms conventional GALS networks by 87-90% in average nanosecond latency with 1.8-6.5 times more throughput across synthetic traffic patterns and SPLASH-2 benchmark suite.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chung-Hsiang Lin, De-Yu Shen, Yi-Jung Chen, Chia-Lin Yang, Cheng-Yuan Michael Wang
{"title":"SECRET: Selective error correction for refresh energy reduction in DRAMs","authors":"Chung-Hsiang Lin, De-Yu Shen, Yi-Jung Chen, Chia-Lin Yang, Cheng-Yuan Michael Wang","doi":"10.1109/ICCD.2012.6378619","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378619","url":null,"abstract":"DRAMs are used as the main memory in most computing systems today. Studies show that DRAMs contribute to a significant part of overall system power consumption. Therefore, one of the main challenges in low-power DRAM design is the inevitable refresh process. Due to process variation, memory cells exhibit retention time variations. Current DRAMs use a single worst-case refresh period. Prolonging refresh intervals introduces retention errors. Previous works adopt conventional ECC (Error Correcting Code) to correct retention errors. These approaches introduce significant area and energy overheads. In this paper, we propose a novel error correction framework for retention errors in DRAMs, called SECRET (Selective Error Correction for Refresh Energy reducTion). The key observation we make is that retention errors can be treated as hard errors rather than soft errors, and only few DRAM cells have large leakage. Therefore, instead of equipping error correction capability in all memory cells as existing ECC schemes, we only allocate error correction information to leaky cells under a refresh interval. Our SECRET framework contains two parts, an off-line phase to identify memory cells with retention errors given a target error rate, and a low-overhead error correction mechanism. The experimental results show that the proposed SECRET framework can reduce refresh power by 87.2%, and overall DRAM power by 18.57% with negligible area and performance overheads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114798386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Fattah, Marco Ramírez, M. Daneshtalab, P. Liljeberg, J. Plosila
{"title":"CoNA: Dynamic application mapping for congestion reduction in many-core systems","authors":"Mohammad Fattah, Marco Ramírez, M. Daneshtalab, P. Liljeberg, J. Plosila","doi":"10.1109/ICCD.2012.6378665","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378665","url":null,"abstract":"Increasing the number of processors in a single chip toward network-based many-core systems requires a run-time task allocation algorithm. We propose an efficient mapping algorithm that assigns communicating tasks of incoming applications onto resources of a many-core system utilizing Network-on-Chip paradigm. In our contiguous neighborhood allocation (CoNA) algorithm, we target at the reduction of both internal and external congestion due to detrimental impact of congestion on the network performance. We approach the goal by keeping the mapped region contiguous and placing the communicating tasks in a close neighborhood. A completely synthesizable simulation environment where none of the system objects are assumed to be ideal is provided. Experiments show at least 40% gain in different mapping cost functions, as well as 16% reduction in average network latency compared to existing algorithms.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel profiled side-channel attack in presence of high Algorithmic Noise","authors":"Mostafa M. I. Taha, P. Schaumont","doi":"10.1109/ICCD.2012.6378675","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378675","url":null,"abstract":"Understanding the nature of hardware designs is a vital element in a successful Side-Channel Analysis. The inherent parallelism of these designs adds excessive Algorithmic Noise in the power consumption trace, which makes it difficult to mount a successful power attack against it. In this paper, we address this high Algorithmic Noise with a novel profiled attack that is generic and independent of any specific cryptographic algorithm. We propose both a new profiling phase and two new insights in the attack phase. The proposed profiling technique takes the high design parallelism into consideration, which results in a more accurate power model. In the attack phase, we first define two new targeted regions in the power trace, then aggregate the attack results from each of them to get a more powerful attack phase. The proposed attack model has been tested on the 128bit AES of the widely known DPA Contest (V2) and achieved a stable 80% Global Success Rate (GSR) at 2755 traces.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123227201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retrospective on “Power-Sensitive Multithreaded Architecture”","authors":"J. Seng, D. Tullsen, George Z. N. Cai","doi":"10.1109/ICCD.2012.6378609","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378609","url":null,"abstract":"This article provides a retrospective look at the research that went into the 2000 ICCD paper “Power-Sensitive Multithreaded Architecture”. At the time, simultaneous multithreading processors were soon to be commercially available and power consumption was proving to be a challenging design constraint. That research introduced optimizations that increased power and energy efficiency through multithreading, while maintaining performance. This article discusses the optimizations in the paper and discusses how processor designs have changed since its publication.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123708016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron Mills, Sudhanshu Vyas, Michael Patterson, Christopher Sabotta, Phillip H. Jones, Joseph Zambreno
{"title":"Design and evaluation of a delay-based FPGA Physically Unclonable Function","authors":"Aaron Mills, Sudhanshu Vyas, Michael Patterson, Christopher Sabotta, Phillip H. Jones, Joseph Zambreno","doi":"10.1109/ICCD.2012.6378632","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378632","url":null,"abstract":"A new Physically Unclonable Function (PUF) variant was developed on an FPGA, and its quality evaluated. It is conceptually similar to PUFs developed using standard SRAM cells, except it utilizes general FPGA reconfigurable fabric, which offers several advantages. Comparison between our approach and other PUF designs indicates that our design is competitive in terms of repeatability within a given instance, and uniqueness between instances. The design can also be tuned to achieve desired response characteristics which broadens the potential range of applications.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130093084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cloud computing: Virtualization and resiliency for data center computing","authors":"V. Salapura","doi":"10.1109/ICCD.2012.6378606","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378606","url":null,"abstract":"Cloud computing is being rapidly adopted across the IT industry, driven by the need to reduce the total cost of ownership of increasingly more demanding workloads. Within companies, private clouds are offering a more efficient way to manage and use private data centers. In the broader marketplace, public clouds offer the promise of buying computing capabilities based on a utility model. This utility model enables IT consumers to purchase compute resources on demand to fit current business needs and scale expenses associated with computing resources. Thus, cloud computing offers IT to be treated as an ongoing variable operating expense billed by usage rather than requiring capital expenditures that must be planned years in advance. Advantageously, operating expenses can be charged against the revenue generated by these expenses directly. In contrast, capital expenses incurred by the purchase of a system need to be paid at the time of purchase, but can only be depreciated to reduce the taxable income over the lifetime of the system.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125879370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}