2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献_第8页

A Write-friendly Arithmetic Coding Scheme for Achieving Energy-Efficient Non-Volatile Memory Systems 实现高能效非易失性存储系统的写友好算术编码方案

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431511

Yi-Shen Chen, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo

{"title":"A Write-friendly Arithmetic Coding Scheme for Achieving Energy-Efficient Non-Volatile Memory Systems","authors":"Yi-Shen Chen, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo","doi":"10.1145/3394885.3431511","DOIUrl":"https://doi.org/10.1145/3394885.3431511","url":null,"abstract":"In the era of the Internet of Things (IoT), wearable IoT devices become popular and closely related to our life. Most of these devices are based on the embedded systems that have to operate on limited energy resources, such as batteries or energy harvesters. Therefore, energy efficiency is one of the critical issues for these devices. To relieve the energy consumption by reducing the total accesses on memory and storage layers, the technologies of storage-class memory (SCM) and data compression techniques are applied to eliminate the data movements and squeeze the data size, respectively. However, the information gap between them hinders the cooperation among the two techniques for achieving further optimizations on minimizing energy consumption. This work proposes a write-friendly arithmetic coding with joint managing both techniques to achieve energy-efficient non-volatile memory (NVM) systems. In particular, the concept of “ignorable bits” is introduced to further skip the write operations while storing the compressed data into SCM devices. The proposed design was evaluated by a series of intensive experiments, and the results are encouraging.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133369410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

28GHz Phase Shifter with Temperature Compensation for 5G NR Phased-array Transceiver 5G NR相控阵收发器的温度补偿28GHz移相器

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431650

Yi Zhang, Jian Pang, Kiyoshi Yanagizawa, A. Shirane, K. Okada

引用次数: 1

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators 基于reram的DNN推理加速器的混合精度量化

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431554

Sitao Huang, Aayush Ankit, P. Silveira, Rodrigo Antunes, S. R. Chalamalasetti, I. E. Hajj, Dong Eun Kim, G. Aguiar, P. Bruel, S. Serebryakov, Cong Xu, Can Li, P. Faraboschi, J. Strachan, Deming Chen, K. Roy, Wen-mei W. Hwu, D. Milojicic

{"title":"Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators","authors":"Sitao Huang, Aayush Ankit, P. Silveira, Rodrigo Antunes, S. R. Chalamalasetti, I. E. Hajj, Dong Eun Kim, G. Aguiar, P. Bruel, S. Serebryakov, Cong Xu, Can Li, P. Faraboschi, J. Strachan, Deming Chen, K. Roy, Wen-mei W. Hwu, D. Milojicic","doi":"10.1145/3394885.3431554","DOIUrl":"https://doi.org/10.1145/3394885.3431554","url":null,"abstract":"ReRAM-based accelerators have shown great potential for accelerating DNN inference because ReRAM crossbars can perform analog matrix-vector multiplication operations with low latency and energy consumption. However, these crossbars require the use of ADCs which constitute a significant fraction of the cost of MVM operations. The overhead of ADCs can be mitigated via partial sum quantization. However, prior quantization flows for DNN inference accelerators do not consider partial sum quantization which is not highly relevant to traditional digital architectures. To address this issue, we propose a mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weight quantization, input quantization, and partial sum quantization are jointly applied for each DNN layer. We also propose an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space. Our evaluation shows that the proposed mixed precision quantization scheme and quantization flow reduce inference latency and energy consumption by up to 3.89× and 4.84×, respectively, while only losing 1.18% in DNN inference accuracy.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115053811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency 动态神经网络实现运行时精度和延迟之间的权衡

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431628

Li Yang, Deliang Fan

{"title":"Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency","authors":"Li Yang, Deliang Fan","doi":"10.1145/3394885.3431628","DOIUrl":"https://doi.org/10.1145/3394885.3431628","url":null,"abstract":"To deploy powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN to reduce the network size and computation complexity with negligible accuracy degradation, such as weight quantization, network pruning, convolution decomposition, etc. However, by utilizing conventional DNN compression methods, a smaller, but fixed, network is generated from a relative large background model to achieve resource limited hardware acceleration. However, such optimization lacks the ability to adjust its structure in real-time to adapt for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review our two prior works [13], [15] to tackle this challenge, discussing how to construct a dynamic DNN by means of either uniform or non-uniform sub-nets generation methods. Moreover, to generate multiple non-uniform sub-nets, [15] needs to fully retrain the background model for each sub-net individually, named as multi-path method. To reduce the training cost, in this work, we further propose a single-path sub-nets generation method that can sample multiple sub-nets in different epochs within one training round. The constructed dynamic DNN, consisting of multiple sub-nets, provides the ability to run-time trade-off the inference accuracy and latency according to hardware resources and environment requirements. In the end, we study the the dynamic DNNs with different sub-nets generation methods on both CIFAR-10 and ImageNet dataset. We also present the run-time tuning of accuracy and latency on both GPU and CPU.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115284409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Novel Technology Mapper for Complex Universal Gates 复杂通用门的新技术映射器

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431561

Meng-Che Wu, A. Dao, Mark Po-Hung Lin

引用次数: 0

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs 结合内存分区和子任务生成的CGRAs并行数据访问

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431414

Cheng Li, Jiangyuan Gu, S. Yin, Leibo Liu, Shaojun Wei

{"title":"Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs","authors":"Cheng Li, Jiangyuan Gu, S. Yin, Leibo Liu, Shaojun Wei","doi":"10.1145/3394885.3431414","DOIUrl":"https://doi.org/10.1145/3394885.3431414","url":null,"abstract":"Coarse-Grained Reconfigurable Architectures (CGRAs) are attractive reconfigurable platforms with the advantages of high performance and power efficiency. In a CGRA based computing system, the computations are often mapped onto the CGRA with parallel memory accesses. To fully exploit the on-chip memory bandwidth, memory partitioning algorithms are widely used to reduce access conflicts. CGRAs have a fixed storage fabric and limited size memory due to the severe area constraints. Previous memory partitioning algorithms assumed that data could be completely transferred into the target memory. However, in practice, we often encounter situations where on-chip storage is insufficient to store the complete data. In order to perform the computation of these applications in the memory-limited CGRA, we first develop a memory partitioning strategy with continual placement, which can also avoid data preprocessing, and then divide the kernel into multiple subtasks that suit the size of the target memory. Experimental results show that, compared to the state-of-the-art method, our approach achieves a 43.2% reduction in data preparation time and an 18.5% improvement in overall performance. If the subtask generation scheme is adopted, our approach can achieve a 14.4% overall performance improvement while reducing memory requirements by 99.7%.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123648399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Standard Cell Routing with Reinforcement Learning and Genetic Algorithm in Advanced Technology Nodes 先进技术节点中基于强化学习和遗传算法的标准单元路由

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431569

Haoxing Ren, Matthew R. Fojtik

引用次数: 9

A Hierarchical Assessment Strategy on Soft Error Propagation in Deep Learning Controller 深度学习控制器软误差传播的分层评估策略

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431573

Ting Liu, Yuzhuo Fu, Yan Zhang, Bin Shi

{"title":"A Hierarchical Assessment Strategy on Soft Error Propagation in Deep Learning Controller","authors":"Ting Liu, Yuzhuo Fu, Yan Zhang, Bin Shi","doi":"10.1145/3394885.3431573","DOIUrl":"https://doi.org/10.1145/3394885.3431573","url":null,"abstract":"Deep learning techniques have been introduced into the field of intelligent controller design in recent years and become an effective alternative in complex control scenarios. In addition to improve control robustness, deep learning controllers (DLCs) also provide a potential fault tolerance to internal disturbances (such as soft errors) due to the inherent redundant structure of deep neural networks (DNNs). In this paper, we propose a hierarchical assessment to characterize the impact of soft errors on the dependability of a PID controller and its DLC alternative. Single-bit-flip injections in underlying hardware and time series data collection from multiple abstraction layers (ALs) are performed on a virtual prototype system based on an ARM Cortex-A9 CPU, with a PID controller and corresponding recurrent neural network (RNN) implemented DLC deployed on it. We employ generative adversarial networks and Bayesian networks to characterize the local and global dependencies caused by soft errors across the system. By analyzing cross-AL fault propagation paths and component sensitivities, we discover that the parallel data processing pipelines and regular feature size scaling mechanism in DLC can effectively prevent critical failure causing faults from propagating to the control output.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125802615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Zero Correlation Error: A Metric for Finite-Length Bitstream Independence in Stochastic Computing 零相关误差:随机计算中有限长度比特流独立性的度量

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431552

Hsuan Hsiao, Joshua San Miguel, Yuko Hara-Azumi, J. Anderson

引用次数: 3

One-pass Synthesis for Field-coupled Nanocomputing Technologies 场耦合纳米计算技术的一次合成

2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431607

Marcel Walter, Winston Haaswijk, R. Wille, F. Sill, R. Drechsler

{"title":"One-pass Synthesis for Field-coupled Nanocomputing Technologies","authors":"Marcel Walter, Winston Haaswijk, R. Wille, F. Sill, R. Drechsler","doi":"10.1145/3394885.3431607","DOIUrl":"https://doi.org/10.1145/3394885.3431607","url":null,"abstract":"Field-coupled Nanocomputing (FCN) is a class of post-CMOS emerging technologies, which promises to overcome certain physical limitations of conventional solutions such as CMOS by allowing for high computational throughput with low power dissipation. Despite their promises, the design of corresponding FCN circuits is still in its infancy. In fact, state-of-the-art solutions still heavily rely on conventional synthesis approaches that do not take the tight physical constraints of FCN circuits (particularly with respect to routability and clocking) into account. Instead, physical design is conducted in a second step in which a classical logic network is mapped onto an FCN layout. Using this two-stage approach with a classical and FCN-oblivious logic network as an intermediate result, frequently leads to substantial quality loss or completely impractical results. In this work, we propose a one-pass synthesis scheme for FCN circuits, which conducts both steps, synthesis and physical design, in a single run. For the first time, this allows to generate exact, i. e., minimal FCN circuits for a given functionality.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129052599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8