2016 IEEE 34th International Conference on Computer Design (ICCD)最新文献

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks CNN-MERP:一种基于fpga的高效内存可重构处理器，用于卷积神经网络的正向和反向传播

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-11-22 DOI: 10.1109/ICCD.2016.7753296

Xushen Han, Dajiang Zhou, Shihao Wang, S. Kimura

{"title":"CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks","authors":"Xushen Han, Dajiang Zhou, Shihao Wang, S. Kimura","doi":"10.1109/ICCD.2016.7753296","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753296","url":null,"abstract":"Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lower than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124722331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

VARIUS-TC: A modular architecture-level model of parametric variation for thin-channel switches VARIUS-TC:薄通道开关参数变化的模块化架构级模型

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-11-22 DOI: 10.1109/ICCD.2016.7753353

S. K. Khatamifard, M. Resch, N. Kim, Ulya R. Karpuzcu

引用次数: 15

How logic masking can improve path delay analysis for Hardware Trojan detection 逻辑屏蔽如何改善硬件木马检测的路径延迟分析

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-02 DOI: 10.1109/ICCD.2016.7753319

Arash Nejat, D. Hély, V. Beroulle

引用次数: 3

A readback based general debugging framework for soft-core processors 一个基于回读的软核处理器通用调试框架

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-02 DOI: 10.1109/ICCD.2016.7753342

Changgong Li, Alexander Schwarz, C. Hochberger

引用次数: 4

Dynamic prefetcher reconfiguration for diverse memory architectures 动态预取器重新配置不同的内存架构

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753270

Junghoon Lee, Taehoon Kim, Jaehyuk Huh

{"title":"Dynamic prefetcher reconfiguration for diverse memory architectures","authors":"Junghoon Lee, Taehoon Kim, Jaehyuk Huh","doi":"10.1109/ICCD.2016.7753270","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753270","url":null,"abstract":"With the advent of stacked memory and new memory architectures, the heterogeneity of memory has been increasing. In the diverse memory technologies, each memory architecture has its own advantages and weaknesses. Considering the trade-offs, future systems are expected to support multiple memory architectures with a hybrid memory system. However, such diversity of memory architectures complicates the performance optimization of on-chip memory hierarchy. One of the key components affected by this trend is the hardware prefetcher. The available memory bandwidth highly affects the effectiveness of prefetchers, and the aggressiveness of prefetchers must be tuned for memory architectures as well as application behaviors. This paper investigates the effect of memory diversity on the prefetcher parameter selection, and proposes a dynamic parameter search mechanism to adjust the prefetch aggressiveness under various memory architectures. Using a general hill climbing scheme periodically, the mechanism adapts to the memory architectures and application behaviors effectively. In addition to such automatic tuning, the study improves the solution for cache pollution exacerbated by the increase of speculative data from more aggressive prefetchers in higher bandwidth memory. With the dynamic parameter search and pollution mitigation, the proposed framework improves the performance of applications by 12.4% on average compared to the prior scheme for tuning prefetch parameters.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125217102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Algorithms for CPU and DRAM DVFS under inefficiency constraints 低效率约束下的CPU和DRAM DVFS算法

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753276

R. Begum, Mark Hempstead, Guru Prasad Srinivasa, Geoffrey Challen

{"title":"Algorithms for CPU and DRAM DVFS under inefficiency constraints","authors":"R. Begum, Mark Hempstead, Guru Prasad Srinivasa, Geoffrey Challen","doi":"10.1109/ICCD.2016.7753276","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753276","url":null,"abstract":"Dynamic voltage and frequency scaling (DVFS) of both the core and DRAM provides opportunities to trade-off performance in order to save energy. Previous approaches to core and DRAM power management using DVFS used performance, specifically acceptable performance loss, as a constraint. We present energy management algorithms that coordinate core and DRAM frequency scaling under a specified energy budget. Approaches that work under performance constraints, as we will show, are not directly applicable to systems operating under energy constraints, as it is difficult to calculate the correct performance bounds in real-time to stay under an energy budget. Setting arbitrary energy budgets for a diverse set of applications can be harmful to application performance. We use the previously introduced concept of Inefficiency - the additional amount of energy above the minimum required energy that can be used to improve performance - to provide a dynamic energy constraint to our system. We introduce new power management algorithms that search the power and performance space to find the best performing point under this constraint. We demonstrate the efficacy of our algorithms using CPU DVFS and DRAM frequency scaling. We show that our algorithms have 24% lower tuning cost and save up to 5% energy with a little performance loss compared to a state-of-the-art performance constrained system.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116901892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Wireless Network-on-Chip analysis of propagation technique for on-chip communication 无线片上网络分析片上通信的传播技术

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753313

Vasil Pano, I. Yilmaz, Yuqiao Liu, B. Taskin, K. Dandekar

{"title":"Wireless Network-on-Chip analysis of propagation technique for on-chip communication","authors":"Vasil Pano, I. Yilmaz, Yuqiao Liu, B. Taskin, K. Dandekar","doi":"10.1109/ICCD.2016.7753313","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753313","url":null,"abstract":"Network-on-Chip (NoC) is a communication paradigm capable of facilitating a scalable interconnection infrastructure for multi core processors. Wireless NoCs have been introduced to improve the communication performance over long-distance processing nodes. Current on-chip antennas used in wireless NoCs communicate predominantly through surface waves, where the efficacy of the wireless nodes is partially determined by the radiation efficiency and transmission gain limited due to the conductivity loss of the silicon substrate. Recently, an on-chip propagation technique of radio waves was introduced, through the un-doped silicon layer as opposed to surface-waves prevalent in literature. The through-substrate propagation waves provide a unique solution to overcome the challenge of long-distance communication between processing nodes. In this work, overall improvements are shown compared to traditional wireless NoCs with the placement of antennas on undoped silicon (i.e. communicating through surface waves), simulated in NoC architectures across performance metrics of area, power consumption and latency.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128238941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Power-aware virtual machine mapping in the data-center-on-a-chip paradigm 芯片上数据中心范例中的功耗感知虚拟机映射

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753286

X. Lin, Yuankun Xue, P. Bogdan, Yanzhi Wang, S. Garg, Massoud Pedram

引用次数: 1

Data-Pattern enabled Self-Recovery multimedia storage system for near-threshold computing 支持数据模式的自恢复多媒体存储系统，用于近阈值计算

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753332

Na Gong, J. Edstrom, Dongliang Chen, Jinhui Wang

{"title":"Data-Pattern enabled Self-Recovery multimedia storage system for near-threshold computing","authors":"Na Gong, J. Edstrom, Dongliang Chen, Jinhui Wang","doi":"10.1109/ICCD.2016.7753332","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753332","url":null,"abstract":"The growing popularity of powerful mobile devices such as smart phones and tablet devices has resulted in the exponential growth of demand for video applications. However, due to the intensive computation of the video decoding process, mobile video applications require frequent embedded memory access, which consumes a large amount of power and limits battery life. Various low-voltage memory techniques have been investigated to enhance the energy efficiency of multimedia processing system. Unfortunately, the existing research suffers from high implementation complexity and large area overhead. In this paper, we present a low-cost self-recovery video storage system by investigating meaningful data patterns hidden in mobile video data. Specifically, we propose a two-dimensional data-pattern approach to explore horizontal data-association and vertical data-correlation characteristics. Based on the identified optimal data patterns, we present a simple circuit-level SRAM design to enable self-recovery at low voltages. A 45nm 32kb SRAM is designed that delivers good video quality at near-threshold voltage (0.5 V) with negligible area overhead (3.97%).","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130678304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Error behaviors testing with temperature and magnetism dependency for MRAM 基于温度和磁力的MRAM误差行为测试

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753302

Xin Shi, Fei Wu, Xidong Guan, C. Xie

引用次数: 3