2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献_第4页

Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems 用于高性能、高能效异构多核系统的动态线程映射

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657025

Guangshuo Liu, Jinpyo Park, Diana Marculescu

{"title":"Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems","authors":"Guangshuo Liu, Jinpyo Park, Diana Marculescu","doi":"10.1109/ICCD.2013.6657025","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657025","url":null,"abstract":"This paper addresses the problem of dynamic thread mapping in heterogeneous many-core systems via an efficient algorithm that maximizes performance under power constraints. Heterogeneous many-core systems are composed of multiple core types with different power-performance characteristics. As well documented in the literature, the generic mapping problem is an NP-complete problem which can be formulated as a 0-1 integer linear program, therefore, prohibitively expensive to solve optimally in an online scenario. However, in real applications, thread mapping decisions need to be responsive to workload phase changes. This paper proposes an iterative approach bounding the runtime as O(n2/m), for mapping multi-threaded applications on n cores comprising of m core types. Compared with an optimal solution, the proposed algorithm produces results less than 0.6% away from optimum on average, with two orders of magnitude improvement in runtime. Results show that performance improvement can reach 16% under iso-power constraints compared to a random mapping. The algorithm can be brought online for hundred-core heterogeneous systems as it scales to systems comprised of 256 cores with less than one millisecond in overhead.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"477 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132652830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Low power multi-level-cell resistive memory design with incomplete data mapping 具有不完全数据映射的低功耗多电平单元电阻存储器设计

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657035

Dimin Niu, Qiaosha Zou, Cong Xu, Yuan Xie

{"title":"Low power multi-level-cell resistive memory design with incomplete data mapping","authors":"Dimin Niu, Qiaosha Zou, Cong Xu, Yuan Xie","doi":"10.1109/ICCD.2013.6657035","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657035","url":null,"abstract":"Phase change memory (PCM) has been widely studied as a potential DRAM alternative. The multi-level cell (MLC) can further increase the memory density and reduce the fabrication cost by storing multiple bits in a single cell. Nevertheless, large write power, high write latency, as well as reliability issue resulted from the resistance drift, bring in challenges for MLC PCM based memory design. In contrast, the emerging Resistive Random Access Memory (ReRAM), which has similar MLC property as PCM, demonstrates better performance and energy efficiency compared to PCM. In addition, due to the physical switching behaviors of ReRAM cell, the resistance drift phenomenon does not exist. In this paper, we propose a low power MLC ReRAM design. We first study the programming method of MLC ReRAM and identify that programming latency and energy are highly dependent on the data pattern written to the cell. Based on this observation, we propose incomplete data mapping (IDM), which maps an eight-level-cell into six states to prevent the time/energy consuming data patterns from appearing in the cell. Furthermore, in order to improve endurance of MLC RAM, which is much smaller than single-level cell (SLC) ReRM due to the complex programming method, we propose Dynamic Data ReMapping (DDRM) to selectively regulate memory blocks from IDM state back to complete data mapping (CDM) state. We demonstrate that the proposed design can work effectively with existing error-correction schemes but requires much smaller space overhead. Experimental results show that, IDM can reduce the energy performance by at most 15% with negligible performance overhead. By combining the DDRM with existing error-correction scheme, DDRM can improve the memory lifetime by 2.75× compared with conventional memory architectures.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115395488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Selected inversion for vectorless power grid verification by exploiting locality 利用局部性对无矢量电网验证进行选择反演

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657051

Jianlei Yang, Yici Cai, Qiang Zhou, Wei Zhao

{"title":"Selected inversion for vectorless power grid verification by exploiting locality","authors":"Jianlei Yang, Yici Cai, Qiang Zhou, Wei Zhao","doi":"10.1109/ICCD.2013.6657051","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657051","url":null,"abstract":"Vectorless power grid verification is a practical approach for early stage safety check without input current patterns. The power grid is usually formulated as a linear system and requires intensive matrix inversion and numerous linear programming, which is extremely time-consuming for large scale power grid verification. In this paper, the power grid is represented in the manner of domain-decomposition approach, and we propose a selected inversion technique to reduce the computation cost of matrix inversion for vectorless verification. The locality existence among power grids is exploited to decide which blocks of matrix inversion should be computed while remaining blocks are not necessary. The vectorless verification could be purposefully performed by this manner of selected inversion while previous direct approaches are required to perform full matrix inversion and then discard small entries to reduce the complexity of linear programming. Meanwhile, constraint locality is proposed to improve the verification accuracy. Experimental results show that the proposed approach could achieve significant speedups compared to previous approaches while still guaranteeing the quality of solution accuracy.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121185420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Semi-analytical current source modeling of near-threshold operating logic cells considering process variations 考虑工艺变化的近阈值操作逻辑单元的半解析电流源建模

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657079

Q. Xie, Tiansong Cui, Yanzhi Wang, Shahin Nazarian, Massoud Pedram

引用次数: 4

Exploring the energy efficiency of Multispeculative Adders 探索多投机加法器的能源效率

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657058

Alberto A. Del Barrio, R. Hermida, S. Memik

{"title":"Exploring the energy efficiency of Multispeculative Adders","authors":"Alberto A. Del Barrio, R. Hermida, S. Memik","doi":"10.1109/ICCD.2013.6657058","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657058","url":null,"abstract":"Variable Latency Adders are attracting strong interest for increasing performance at a low cost. However, most of the literature is focused on achieving a good area-delay tradeoff. In this paper we consider multispeculation as an alternative for designing adders with low energy consumption, while offering better performance than the corresponding non-speculative ones. Instead of introducing more logic to accelerate the computation, the adder is split into several fragments which operate in parallel, and whose carry-in signals are provided by predictor units. On the one hand, the critical path of the module is shortened, and on the other hand the frequent useless glitches produced in the carry propagation structure are diminished. Hence, this will be translated into an overall energy reduction. Several experiments have been performed with linear and logarithmic adders, and results show energy savings by up to 90% and 70%, respectively, while achieving an additional execution time decrease. Furthermore, when utilized in whole datapaths with current control techniques, it is possible to reduce execution time by 24.5% (34% best case) and energy by 32% (48% best case) on average.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126551516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Register allocation and VDD-gating algorithms for out-of-order architectures 无序体系结构的寄存器分配和vdd门控算法

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657032

Steven J. Battle, Mark Hempstead

引用次数: 1

Phoenix NoC: A distributed fault tolerant architecture Phoenix NoC:分布式容错架构

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657018

C. Marcon, Alexandre M. Amory, T. Webber, Thomas Volpato, L. Poehls

引用次数: 9

Exploiting correlation in stochastic circuit design 利用随机电路设计中的相关性

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657023

Armin Alaghi, J. Hayes

引用次数: 135

FlexiWay: A cache energy saving technique using fine-grained cache reconfiguration FlexiWay:一种使用细粒度缓存重构的缓存节能技术

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657031

Sparsh Mittal, Zhao Zhang, J. Vetter

{"title":"FlexiWay: A cache energy saving technique using fine-grained cache reconfiguration","authors":"Sparsh Mittal, Zhao Zhang, J. Vetter","doi":"10.1109/ICCD.2013.6657031","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657031","url":null,"abstract":"Recent trends of CMOS scaling and use of large last level caches (LLCs) have led to significant increase in the leakage energy consumption of LLCs and hence, managing their energy consumption has become extremely important in modern processor design. The conventional cache energy saving techniques require offline profiling or provide only coarse granularity of cache allocation. We present FlexiWay, a cache energy saving technique which uses dynamic cache reconfiguration. FlexiWay logically divides the cache sets into multiple (e.g. 16) modules and dynamically turns off suitable and possibly different number of cache ways in each module. FlexiWay has very small implementation overhead and it provides fine-grain cache allocation even with caches of typical associativity, e.g. an 8-way cache. Microarchitectural simulations have been performed using an x86-64 simulator and workloads from SPEC2006 suite. Also, FlexiWay has been compared with two conventional energy saving techniques. The results show that FlexiWay provides largest energy saving and incurs only small loss in performance. For single, dual and quad core systems, the average energy saving using FlexiWay are 26.2%, 25.7% and 22.4%, respectively.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129934577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Performance simulator based on hardware resources constraints for ion trap quantum computer 基于硬件资源约束的离子阱量子计算机性能模拟器

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657073

Muhammad Ahsan, Byung-Soo Choi, Jungsang Kim

{"title":"Performance simulator based on hardware resources constraints for ion trap quantum computer","authors":"Muhammad Ahsan, Byung-Soo Choi, Jungsang Kim","doi":"10.1109/ICCD.2013.6657073","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657073","url":null,"abstract":"Efforts to build quantum computers using ion-traps have demonstrated all elementary qubit operations necessary for scalable implementation. Modular architectures have been proposed to construct modest size quantum computers with up to 104 - 106 qubits using technologies that are available today (MUSIQC architecture). Concrete scheduling procedure to execute a given quantum algorithm on such a hardware is a significant task, but existing quantum CAD tools generally do not account for the underlying connectivity of the qubits or the limitation on the hardware resources available for the scheduling. We present a scheduler and performance simulator that fully accounts for these resource constraints, capable of estimating the execution time and error performances of executing a quantum circuit on the hardware. We outline the construction of tool components, and describe the process of mapping the qubits to ions and scheduling the physical gates in the MUSIQC architecture. Using this tool, we quantify the trade-off between hardware resource constraints and performance of the computer and show that at an expense of x fold increase in latency, a minimum of 1.6x resource reduction is possible for executing a three-qubit Bernstein-Vazirani algorithm encoded using Steane code.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130293331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5