2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
SLIDER: Smart Late Injection DEflection Router for mesh NoCs 滑块:智能晚注入偏转路由器为网格noc
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657068
Bhawna Nayak, John Jose, M. Mutyam
{"title":"SLIDER: Smart Late Injection DEflection Router for mesh NoCs","authors":"Bhawna Nayak, John Jose, M. Mutyam","doi":"10.1109/ICCD.2013.6657068","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657068","url":null,"abstract":"Network-on-Chip (NoC) provides a scalable communication interface for processing cores in large multicore systems. An efficient NoC router should not only minimize the average packet latency of the network but also have minimum pipeline latency, area, and power. Area and power overheads are affecting the scalability and popularity of traditional input buffered routers. In this context minimally buffered deflection routers are emerging as a cost effective alternative. We propose SLIDER, Smart Late Injection DEflection Router, that uses side buffers for accommodating a fraction of deflected flits. The main contributions of this work are smart late injection and selective flit preemption. In SLIDER the injection stage is kept at the end of the router pipeline. This reduces the contention in the arbitration stage, eliminates unwanted intra-router movement of flits and effectively utilizes the idle output channels. We parallelize independent operations in the router pipeline and reduce the pipeline latency by 25%. Experimental results on synthetic and real workloads show that SLIDER reduces average flit latency, channel wastage, and deflection rate, and increases throughput in the network when compared to the state-of-the-art minimally buffered deflection routers.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116241205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Scattered superpage: A case for bridging the gap between superpage and page coloring 分散的超级页:一个弥合超级页和页面着色之间差距的案例
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657040
Licheng Chen, Yanan Wang, Zehan Cui, Yongbing Huang, Yungang Bao, Mingyu Chen
{"title":"Scattered superpage: A case for bridging the gap between superpage and page coloring","authors":"Licheng Chen, Yanan Wang, Zehan Cui, Yongbing Huang, Yungang Bao, Mingyu Chen","doi":"10.1109/ICCD.2013.6657040","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657040","url":null,"abstract":"Superpage and page coloring are two important practical techniques to improve the performance of Translation Lookaside Buffers (TLBs) and shared Last Level Cache (LLC) respectively. However, there exists a gap between these two techniques in current hardware-architecture design, resulting in the contradiction in adopting these two optimizations simultaneously: a superpage requires hundreds of contiguous (e.g. a power of two) base pages in both virtual and physical memory, which would compulsorily occupy all available page colors (or cache sets), thus making page coloring failed to work. This is because most contemporary architecture adopts the design with cache set indexes placed in the least significant part of block address. In this paper, we propose a lightweight approach named Scattered Superpage to bridge this gap. Scattered Superpage decouples a superpage from the limitation of occupying multiple contiguous physical base pages. A superpage is still contiguous in virtual memory, but it is scattered mapping into multiple physical superpages, and it just occupies specified partial page colors in each physical superpage, thus it allows us to configure page color for each superpage. The huge TLB is slightly modified to store page color configuration for each superpage and to calculate target physical address based on this configuration when doing address translation. The experimental results show that the Scattered Superpage can improve system performance by 20.51% and reduce unfairness by 27.77% in our 4-core simulation system (with multi-program memory-intensive workloads). It achieves this by reducing last level cache miss by 17.05% and reducing TLB miss by 86.02% simultaneously.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134375383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Statistical analysis and modeling for error composition in approximate computation circuits 近似计算电路误差组成的统计分析与建模
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657024
W. Chan, A. Kahng, Seokhyeong Kang, Rakesh Kumar, J. Sartori
{"title":"Statistical analysis and modeling for error composition in approximate computation circuits","authors":"W. Chan, A. Kahng, Seokhyeong Kang, Rakesh Kumar, J. Sartori","doi":"10.1109/ICCD.2013.6657024","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657024","url":null,"abstract":"Aggressive requirements for low power and high performance in VLSI designs have led to increased interest in approximate computation. Approximate hardware modules can achieve improved energy efficiency compared to accurate hardware modules. While a number of previous works have proposed hardware modules for approximate arithmetic, these works focus on solitary approximate arithmetic operations. To utilize the benefit of approximate hardware modules, CAD tools should be able to quickly and accurately estimate the output quality of composed approximate designs. A previous work [10] proposes an interval-based approach for evaluating the output quality of certain approximate arithmetic designs. However, their approach uses sampled error distributions to store the characterization data of hardware, and its accuracy is limited by the number of intervals used during characterization. In this work, we propose an approach for output quality estimation of approximate designs that is based on a lookup table technique that characterizes the statistical properties of approximate hardwares and a regression-based technique for composing statistics to formulate output quality. These two techniques improve the speed and accuracy for several error metrics over a set of multiply-accumulator testcases. Compared to the interval-based modeling approach of [10], our approach for estimating output quality of approximate designs is 3.75× more accurate for comparable runtime on the testcases and achieves 8.4× runtime reduction for the error composition flow. We also demonstrate that our approach is applicable to general testcases.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Low-current probabilistic writes for power-efficient STT-RAM caches 低电流概率写入节能STT-RAM缓存
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657095
Nikolaos Strikos, Vasileios Kontorinis, Xiangyu Dong, H. Homayoun, D. Tullsen
{"title":"Low-current probabilistic writes for power-efficient STT-RAM caches","authors":"Nikolaos Strikos, Vasileios Kontorinis, Xiangyu Dong, H. Homayoun, D. Tullsen","doi":"10.1109/ICCD.2013.6657095","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657095","url":null,"abstract":"MRAM has emerged as one of the most attractive non-volatile solutions due to fast read access, low leakage power, high bit density, and long endurance. However, the high power consumption of write operations remains a barrier to the commercial adoption of MRAM technology. This paper addresses this problem by introducing low-current probabilistic writes (LCPW), a technique that reduces write access energy by lowering the amplitude of the write current pulse. Although low current pulses no longer guarantee successful bit write operations, we propose and evaluate a simple technique to ensure correctness and achieve significant power reduction over a typical MRAM implementation.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130498273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging 瓦特内部:多核电源调试的软硬件合作方法
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657062
Jie Chen, Fan Yao, Guru Venkataramani
{"title":"Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging","authors":"Jie Chen, Fan Yao, Guru Venkataramani","doi":"10.1109/ICCD.2013.6657062","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657062","url":null,"abstract":"Multicore computing presents unique challenges for performance and power optimizations due to the multiplicity of cores and the complexity of interactions between the hardware resources. Understanding multicore power and its implications on application behavior is critical to the future of multicore software development. In this paper, we propose Watts-inside, a hardware-software cooperative framework that relies on the efficiency of hardware support to accurately gather application power profiles, and utilizes software support and causation principles for a more comprehensive understanding of application power. We show the design of our framework, along with certain optimizations that increase the ease of implementation. We present a case study using two real applications, Ocean (Splash-2) and Streamcluster (Parsec-1.0) where, with the help of feedback from Watts-inside framework, we made simple code modifications and realized up to 5% power savings on chip power consumption.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121367757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Chisel-Q: Designing quantum circuits with a scala embedded language 凿- q:用scala嵌入式语言设计量子电路
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657075
Xiao Liu, J. Kubiatowicz
{"title":"Chisel-Q: Designing quantum circuits with a scala embedded language","authors":"Xiao Liu, J. Kubiatowicz","doi":"10.1109/ICCD.2013.6657075","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657075","url":null,"abstract":"We introduce Chisel-Q, a high-level functional language for generating quantum circuits. Chisel-Q permits quantum computing algorithms to be constructed using the meta-language features of Scala and its embedded DSL Chisel. With Chisel-Q, designers of quantum computing algorithms gain access to high-level, modern language features and abstractions. We describe a synthesis flow that transforms Chisel-Q into an explicit quantum circuit in the Quantum Assembly Language (QASM) format. We also discuss several optimizations to reduce the generated hardware cost. The Chisel-Q tool includes resource and performance estimation which can be used to compare different implementations of the same functionality. We compare the output of the generic Chisel-Q synthesis flow with hand-tuned versions of well-known quantum circuits.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123612334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On dynamic polymorphing of a superscalar core for improving energy efficiency 用于提高能效的超标量磁芯动态多晶化研究
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657091
S. Srinivasan, Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu
{"title":"On dynamic polymorphing of a superscalar core for improving energy efficiency","authors":"S. Srinivasan, Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu","doi":"10.1109/ICCD.2013.6657091","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657091","url":null,"abstract":"The computational needs of a program change over time. Sometimes a program exhibits low instruction level parallelism (ILP), while at other times the inherent ILP may be higher; sometimes a program stalls due to a large number of cache misses, while at other times it may exhibit high cache throughput. Asymmetric Multicore Processors (AMP) have been proposed to allow matching the computing needs of a thread to a core where it executes most efficiently. Some of the recent works focus on AMPs consisting of a monolithic large out-of-order (OOO) core and a small in-order (InO) core. Dynamic swapping of threads between these cores is then facilitated to improve energy efficiency of the threads without impacting performance too negatively. Swapping decisions are made at coarse grain instruction granularities to mitigate the impact of migration overhead. This excludes many opportunities for swap at a fine granular level. In this paper we consider a single superscalar OOO core that can morph itself dynamically into an InO core at runtime. In order to determine when to morph from OOO to InO and vice-versa, we rely on certain hardware performance monitors. Using these performance monitors we estimate the energy-delay-squared product (ED2P) for both modes of operation, which is then used to make morphing decisions. The morphing hardware support is simple and is already available in certain Intel processors to facilitate debug. The proposed scheme has low migration overhead, that enables fine-grain morphing to achieve more energy efficient computing by trading a small loss of performance for much greater energy reduction.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Managing test coverage uncertainty due to thermal noise in nano-CMOS: A case-study on an SRAM array 纳米cmos中热噪声引起的测试覆盖不确定性管理:SRAM阵列的案例研究
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657043
Vikram B. Suresh, S. Kundu
{"title":"Managing test coverage uncertainty due to thermal noise in nano-CMOS: A case-study on an SRAM array","authors":"Vikram B. Suresh, S. Kundu","doi":"10.1109/ICCD.2013.6657043","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657043","url":null,"abstract":"From system-on-a-chip to high performance processors, SRAM is a critical component. In highly scaled CMOS devices, process variation is a major concern as it affects SRAM stability which often sets the floor on supply voltage and the ceiling on operating temperature of a semiconductor chip. Consequently, low-voltage and high temperature testing are often part of manufacturing test flow. In this paper, we show that for marginal cells, thermal noise is a major corrupting factor that affects the outcome of testing. A cell with large process variation which should ordinarily fail during memory test may pass due to impact of thermal noise at high temperature. To address this uncertainty during testing, we propose a stochastic metric for test coverage. We also propose application of N-detect and Multi-level Word Line (WL) techniques to improve test coverage based on this stochastic metric. Simulation studies on 32nm PTM models indicate varying probability of faulty bit detection across the spectrum of random thermal noise that lead to erroneous test results. Multiple accesses to each bit cell during test increases the fault coverage from -10% to near ideal 100%. Boosting WL voltage during read test and scaling it below nominal voltage during write test accelerates fault detection. Simulation of a 1KB SRAM array test case shows an improvement in fault coverage from -88% to 100% by increasing the number of detects to 100.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128433647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Characterizing the costs and benefits of hardware parallelism in accelerator cores 描述加速器核心中硬件并行的成本和收益
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657021
Steven J. Battle, Mark Hempstead
{"title":"Characterizing the costs and benefits of hardware parallelism in accelerator cores","authors":"Steven J. Battle, Mark Hempstead","doi":"10.1109/ICCD.2013.6657021","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657021","url":null,"abstract":"Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131919386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Free ECC: An efficient error protection for compressed last-level caches 免费ECC:为压缩的最后一级缓存提供有效的错误保护
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657054
Long Chen, Yanan Cao, Zhao Zhang
{"title":"Free ECC: An efficient error protection for compressed last-level caches","authors":"Long Chen, Yanan Cao, Zhao Zhang","doi":"10.1109/ICCD.2013.6657054","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657054","url":null,"abstract":"Cache reliability is increasingly a concern as cache cell dimension shrinks and cache capacity grows. Conventionally, an extra, dedicated storage is appended to cache to store error correcting code. Recently, cache compression schemes have been proposed to increase the effective cache capacity of last-level cache (LLC), for which we found the conventional cache ECC design is inefficient. We propose Free ECC that utilizes the unused fragments in compressed cache design to store ECC. It not only reduces the chip overhead but also improves cache utilization and power efficiency. Additionally, we propose an efficient convergent cache allocation scheme to organize the compressed data blocks more effectively than existing schemes. Our evaluation using SPEC CPU2006 and PARSEC benchmarks shows that the Free ECC design improves cache capacity utilization and power efficiency significantly, with negligible overhead on overall performance. This new design makes compressed cache an increasingly viable choice for processors with requirements of high reliability.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134279202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信