2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)最新文献_第8页

A Novel 3D DRAM Memory Cube Architecture for Space Applications 一种新的空间应用3D DRAM记忆体架构

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195978

Anthony Agnesina, A. Sidana, James Yamaguchi, Christian Krutzik, J. Carson, J. Yang-Scharlotta, S. Lim

引用次数: 9

DAC 2018 Copyright Page DAC 2018版权页面

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1109/dac.2018.8465927

引用次数: 0

Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks 用于二进制神经网络的自定义位单元并行化SRAM阵列

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196089

Rui Liu, Xiaochen Peng, Xiaoyu Sun, W. Khwa, Xin Si, Jia-Jing Chen, Jia-Fang Li, Meng-Fan Chang, Shimeng Yu

{"title":"Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks","authors":"Rui Liu, Xiaochen Peng, Xiaoyu Sun, W. Khwa, Xin Si, Jia-Jing Chen, Jia-Fang Li, Meng-Fan Chang, Shimeng Yu","doi":"10.1145/3195970.3196089","DOIUrl":"https://doi.org/10.1145/3195970.3196089","url":null,"abstract":"Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: hybrid BNN (HBNN) and XNOR-BNN, where the weights are binarized to +1/−1 while the neuron activations are binarized to 1/0 and +1/−1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64 × 64 sub-array size and 3-bit MLSA, HBNN and XNOR-BNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energy-efficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"96 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75929362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Measurement-Based Cache Representativeness on Multipath Programs 多路径程序中基于测量的缓存代表性

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196075

Suzana Milutinovic, J. Abella, E. Mezzetti, F. Cazorla

引用次数: 0

Closed yet Open DRAM: Achieving Low Latency and High Performance in DRAM Memory Systems 封闭而开放的DRAM:实现低延迟和高性能的DRAM存储系统

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196008

Lavanya Subramanian, Kaushik Vaidyanathan, Anant V. Nori, S. Subramoney, T. Karnik, Hong Wang

引用次数: 3

INVITED: Reducing Time and Effort in IC Implementation: A Roadmap of Challenges and Solutions 邀请:减少集成电路实施的时间和精力:挑战和解决方案的路线图

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1109/DAC.2018.8465871

A. Kahng

引用次数: 1

An Architecture-Agnostic Integer Linear Programming Approach to CGRA Mapping CGRA映射的体系结构不可知整数线性规划方法

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195986

S. Alexander Chin, Jason H. Anderson

引用次数: 53

Extracting Data Parallelism in Non-Stencil Kernel Computing by Optimally Coloring Folded Memory Conflict Graph 基于折叠内存冲突图优化着色的非模板核计算数据并行性提取

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196088

Juan Escobedo, Mingjie Lin

{"title":"Extracting Data Parallelism in Non-Stencil Kernel Computing by Optimally Coloring Folded Memory Conflict Graph","authors":"Juan Escobedo, Mingjie Lin","doi":"10.1145/3195970.3196088","DOIUrl":"https://doi.org/10.1145/3195970.3196088","url":null,"abstract":"Irregular memory access pattern in non-stencil kernel computing renders the well-known hyperplane- [1], lattice- [2], or tessellation-based [3] HLS techniques ineffective. We develop an elegant yet effective technique that synthesizes memory-optimal architecture from high level software code in order to maximize application-specific data parallelism. Our basic idea is to exploit graph structures embedded in data access pattern and computation structure in order to perform the memory banking that maximizes parallel memory accesses while conserving both hardware and energy consumption. Specifically, we priority color a weighted conflict graph generated from folding the fundamental conflict graph to maximize memory conflict reduction. Most interestingly, our graph-based methodology enables a straightforward tradeoff between the number of memory banks and minimizing memory conflicts.We empirically test our methodology with Vivado HLx 2015.4 on a standard Kintex-7 device for six benchmark computing kernels by measuring conflict reduction. In particular, our approach only require 9.56% LUT, 3.2% FF, 2.5% BRAM, and 11.33% DSP of the total available hardware resource to obtain a mapping function that achieves a 90% conflict reduction on a modified forward Gaussian elimination Kernel with 4 simultaneous memory accesses.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"82 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85598870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Machine Learning Framework to Identify Detailed Routing Short Violations from a Placed Netlist 从放置的网表中识别详细路由短违规的机器学习框架

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195975

Aysa Fakheri Tabrizi, L. Rakai, Nima Karimpour Darav, Ismail Bustany, L. Behjat, Shuchang Xu, A. Kennings

引用次数: 53

Subutai: Distributed Synchronization Primitives in NoC Interfaces for Legacy Parallel-Applications 遗留并行应用程序的NoC接口中的分布式同步原语

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196124

R. Cataldo, Ramon Fernandes, Kevin J. M. Martin, Martha Johanna Sepúlveda, A. Susin, C. Marcon, J. Diguet

{"title":"Subutai: Distributed Synchronization Primitives in NoC Interfaces for Legacy Parallel-Applications","authors":"R. Cataldo, Ramon Fernandes, Kevin J. M. Martin, Martha Johanna Sepúlveda, A. Susin, C. Marcon, J. Diguet","doi":"10.1145/3195970.3196124","DOIUrl":"https://doi.org/10.1145/3195970.3196124","url":null,"abstract":"Parallel applications are essential for efficiently using the computational power of a Multiprocessor System-on-Chip (MPSoC). Unfortunately, these applications do not scale effortlessly with the number of cores because of synchronization operations that take away valuable computational time and restrict the parallelization gains. Moreover, synchronization is also a bottleneck due to sequential access to shared memory. We address this issue and introduce ”Subutai”, a hardware/software (HW/SW) architecture designed to distribute essential synchronization mechanisms over the Network-on-Chip (NoC). It includes Network Interfaces (NIs), drivers and a custom library of a NoC-based MPSoC architecture that speeds up the essential synchronization primitives of any legacy parallel application. Besides, we provide a fast simulation tool for parallel applications and a HW architecture of the NI. Experimental results with PARSEC benchmark show an average application speedup of 2.05 compared to the same architecture running legacy SW solutions for 36% overhead of HW architecture.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"50 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78357846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5