2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献_第3页

SLIDER: Smart Late Injection DEflection Router for mesh NoCs 滑块:智能晚注入偏转路由器为网格noc

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657068

Bhawna Nayak, John Jose, M. Mutyam

{"title":"SLIDER: Smart Late Injection DEflection Router for mesh NoCs","authors":"Bhawna Nayak, John Jose, M. Mutyam","doi":"10.1109/ICCD.2013.6657068","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657068","url":null,"abstract":"Network-on-Chip (NoC) provides a scalable communication interface for processing cores in large multicore systems. An efficient NoC router should not only minimize the average packet latency of the network but also have minimum pipeline latency, area, and power. Area and power overheads are affecting the scalability and popularity of traditional input buffered routers. In this context minimally buffered deflection routers are emerging as a cost effective alternative. We propose SLIDER, Smart Late Injection DEflection Router, that uses side buffers for accommodating a fraction of deflected flits. The main contributions of this work are smart late injection and selective flit preemption. In SLIDER the injection stage is kept at the end of the router pipeline. This reduces the contention in the arbitration stage, eliminates unwanted intra-router movement of flits and effectively utilizes the idle output channels. We parallelize independent operations in the router pipeline and reduce the pipeline latency by 25%. Experimental results on synthetic and real workloads show that SLIDER reduces average flit latency, channel wastage, and deflection rate, and increases throughput in the network when compared to the state-of-the-art minimally buffered deflection routers.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116241205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Scattered superpage: A case for bridging the gap between superpage and page coloring 分散的超级页:一个弥合超级页和页面着色之间差距的案例

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657040

Licheng Chen, Yanan Wang, Zehan Cui, Yongbing Huang, Yungang Bao, Mingyu Chen

{"title":"Scattered superpage: A case for bridging the gap between superpage and page coloring","authors":"Licheng Chen, Yanan Wang, Zehan Cui, Yongbing Huang, Yungang Bao, Mingyu Chen","doi":"10.1109/ICCD.2013.6657040","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657040","url":null,"abstract":"Superpage and page coloring are two important practical techniques to improve the performance of Translation Lookaside Buffers (TLBs) and shared Last Level Cache (LLC) respectively. However, there exists a gap between these two techniques in current hardware-architecture design, resulting in the contradiction in adopting these two optimizations simultaneously: a superpage requires hundreds of contiguous (e.g. a power of two) base pages in both virtual and physical memory, which would compulsorily occupy all available page colors (or cache sets), thus making page coloring failed to work. This is because most contemporary architecture adopts the design with cache set indexes placed in the least significant part of block address. In this paper, we propose a lightweight approach named Scattered Superpage to bridge this gap. Scattered Superpage decouples a superpage from the limitation of occupying multiple contiguous physical base pages. A superpage is still contiguous in virtual memory, but it is scattered mapping into multiple physical superpages, and it just occupies specified partial page colors in each physical superpage, thus it allows us to configure page color for each superpage. The huge TLB is slightly modified to store page color configuration for each superpage and to calculate target physical address based on this configuration when doing address translation. The experimental results show that the Scattered Superpage can improve system performance by 20.51% and reduce unfairness by 27.77% in our 4-core simulation system (with multi-program memory-intensive workloads). It achieves this by reducing last level cache miss by 17.05% and reducing TLB miss by 86.02% simultaneously.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134375383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Statistical analysis and modeling for error composition in approximate computation circuits 近似计算电路误差组成的统计分析与建模

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657024

W. Chan, A. Kahng, Seokhyeong Kang, Rakesh Kumar, J. Sartori

{"title":"Statistical analysis and modeling for error composition in approximate computation circuits","authors":"W. Chan, A. Kahng, Seokhyeong Kang, Rakesh Kumar, J. Sartori","doi":"10.1109/ICCD.2013.6657024","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657024","url":null,"abstract":"Aggressive requirements for low power and high performance in VLSI designs have led to increased interest in approximate computation. Approximate hardware modules can achieve improved energy efficiency compared to accurate hardware modules. While a number of previous works have proposed hardware modules for approximate arithmetic, these works focus on solitary approximate arithmetic operations. To utilize the benefit of approximate hardware modules, CAD tools should be able to quickly and accurately estimate the output quality of composed approximate designs. A previous work [10] proposes an interval-based approach for evaluating the output quality of certain approximate arithmetic designs. However, their approach uses sampled error distributions to store the characterization data of hardware, and its accuracy is limited by the number of intervals used during characterization. In this work, we propose an approach for output quality estimation of approximate designs that is based on a lookup table technique that characterizes the statistical properties of approximate hardwares and a regression-based technique for composing statistics to formulate output quality. These two techniques improve the speed and accuracy for several error metrics over a set of multiply-accumulator testcases. Compared to the interval-based modeling approach of [10], our approach for estimating output quality of approximate designs is 3.75× more accurate for comparable runtime on the testcases and achieves 8.4× runtime reduction for the error composition flow. We also demonstrate that our approach is applicable to general testcases.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Low-current probabilistic writes for power-efficient STT-RAM caches 低电流概率写入节能STT-RAM缓存

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657095

Nikolaos Strikos, Vasileios Kontorinis, Xiangyu Dong, H. Homayoun, D. Tullsen

引用次数: 15

Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging 瓦特内部:多核电源调试的软硬件合作方法

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657062

Jie Chen, Fan Yao, Guru Venkataramani

引用次数: 9

Chisel-Q: Designing quantum circuits with a scala embedded language 凿- q:用scala嵌入式语言设计量子电路

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657075

Xiao Liu, J. Kubiatowicz

引用次数: 9

On dynamic polymorphing of a superscalar core for improving energy efficiency 用于提高能效的超标量磁芯动态多晶化研究

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657091

S. Srinivasan, Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu

{"title":"On dynamic polymorphing of a superscalar core for improving energy efficiency","authors":"S. Srinivasan, Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu","doi":"10.1109/ICCD.2013.6657091","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657091","url":null,"abstract":"The computational needs of a program change over time. Sometimes a program exhibits low instruction level parallelism (ILP), while at other times the inherent ILP may be higher; sometimes a program stalls due to a large number of cache misses, while at other times it may exhibit high cache throughput. Asymmetric Multicore Processors (AMP) have been proposed to allow matching the computing needs of a thread to a core where it executes most efficiently. Some of the recent works focus on AMPs consisting of a monolithic large out-of-order (OOO) core and a small in-order (InO) core. Dynamic swapping of threads between these cores is then facilitated to improve energy efficiency of the threads without impacting performance too negatively. Swapping decisions are made at coarse grain instruction granularities to mitigate the impact of migration overhead. This excludes many opportunities for swap at a fine granular level. In this paper we consider a single superscalar OOO core that can morph itself dynamically into an InO core at runtime. In order to determine when to morph from OOO to InO and vice-versa, we rely on certain hardware performance monitors. Using these performance monitors we estimate the energy-delay-squared product (ED2P) for both modes of operation, which is then used to make morphing decisions. The morphing hardware support is simple and is already available in certain Intel processors to facilitate debug. The proposed scheme has low migration overhead, that enables fine-grain morphing to achieve more energy efficient computing by trading a small loss of performance for much greater energy reduction.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Managing test coverage uncertainty due to thermal noise in nano-CMOS: A case-study on an SRAM array 纳米cmos中热噪声引起的测试覆盖不确定性管理:SRAM阵列的案例研究

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657043

Vikram B. Suresh, S. Kundu

{"title":"Managing test coverage uncertainty due to thermal noise in nano-CMOS: A case-study on an SRAM array","authors":"Vikram B. Suresh, S. Kundu","doi":"10.1109/ICCD.2013.6657043","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657043","url":null,"abstract":"From system-on-a-chip to high performance processors, SRAM is a critical component. In highly scaled CMOS devices, process variation is a major concern as it affects SRAM stability which often sets the floor on supply voltage and the ceiling on operating temperature of a semiconductor chip. Consequently, low-voltage and high temperature testing are often part of manufacturing test flow. In this paper, we show that for marginal cells, thermal noise is a major corrupting factor that affects the outcome of testing. A cell with large process variation which should ordinarily fail during memory test may pass due to impact of thermal noise at high temperature. To address this uncertainty during testing, we propose a stochastic metric for test coverage. We also propose application of N-detect and Multi-level Word Line (WL) techniques to improve test coverage based on this stochastic metric. Simulation studies on 32nm PTM models indicate varying probability of faulty bit detection across the spectrum of random thermal noise that lead to erroneous test results. Multiple accesses to each bit cell during test increases the fault coverage from -10% to near ideal 100%. Boosting WL voltage during read test and scaling it below nominal voltage during write test accelerates fault detection. Simulation of a 1KB SRAM array test case shows an improvement in fault coverage from -88% to 100% by increasing the number of detects to 100.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128433647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Characterizing the costs and benefits of hardware parallelism in accelerator cores 描述加速器核心中硬件并行的成本和收益

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657021

Steven J. Battle, Mark Hempstead

引用次数: 1

Free ECC: An efficient error protection for compressed last-level caches 免费ECC:为压缩的最后一级缓存提供有效的错误保护

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657054

Long Chen, Yanan Cao, Zhao Zhang

引用次数: 18