Yi-Shen Chen, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo
{"title":"A Write-friendly Arithmetic Coding Scheme for Achieving Energy-Efficient Non-Volatile Memory Systems","authors":"Yi-Shen Chen, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo","doi":"10.1145/3394885.3431511","DOIUrl":"https://doi.org/10.1145/3394885.3431511","url":null,"abstract":"In the era of the Internet of Things (IoT), wearable IoT devices become popular and closely related to our life. Most of these devices are based on the embedded systems that have to operate on limited energy resources, such as batteries or energy harvesters. Therefore, energy efficiency is one of the critical issues for these devices. To relieve the energy consumption by reducing the total accesses on memory and storage layers, the technologies of storage-class memory (SCM) and data compression techniques are applied to eliminate the data movements and squeeze the data size, respectively. However, the information gap between them hinders the cooperation among the two techniques for achieving further optimizations on minimizing energy consumption. This work proposes a write-friendly arithmetic coding with joint managing both techniques to achieve energy-efficient non-volatile memory (NVM) systems. In particular, the concept of “ignorable bits” is introduced to further skip the write operations while storing the compressed data into SCM devices. The proposed design was evaluated by a series of intensive experiments, and the results are encouraging.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133369410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Zhang, Jian Pang, Kiyoshi Yanagizawa, A. Shirane, K. Okada
{"title":"28GHz Phase Shifter with Temperature Compensation for 5G NR Phased-array Transceiver","authors":"Yi Zhang, Jian Pang, Kiyoshi Yanagizawa, A. Shirane, K. Okada","doi":"10.1145/3394885.3431650","DOIUrl":"https://doi.org/10.1145/3394885.3431650","url":null,"abstract":"A phase shifter with temperature compensation for 28GHz phased-array TRX is presented. A precise low-voltage current reference is proposed for the IDAC biasing circuit. The total gain variation for a single TX path including phase shifter and post stage amplifiers over -40°C to 80°C is only 1dB in measurement and the overall phase error due to temperature is less than 1 degree without off-chip calibration.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129502207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sitao Huang, Aayush Ankit, P. Silveira, Rodrigo Antunes, S. R. Chalamalasetti, I. E. Hajj, Dong Eun Kim, G. Aguiar, P. Bruel, S. Serebryakov, Cong Xu, Can Li, P. Faraboschi, J. Strachan, Deming Chen, K. Roy, Wen-mei W. Hwu, D. Milojicic
{"title":"Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators","authors":"Sitao Huang, Aayush Ankit, P. Silveira, Rodrigo Antunes, S. R. Chalamalasetti, I. E. Hajj, Dong Eun Kim, G. Aguiar, P. Bruel, S. Serebryakov, Cong Xu, Can Li, P. Faraboschi, J. Strachan, Deming Chen, K. Roy, Wen-mei W. Hwu, D. Milojicic","doi":"10.1145/3394885.3431554","DOIUrl":"https://doi.org/10.1145/3394885.3431554","url":null,"abstract":"ReRAM-based accelerators have shown great potential for accelerating DNN inference because ReRAM crossbars can perform analog matrix-vector multiplication operations with low latency and energy consumption. However, these crossbars require the use of ADCs which constitute a significant fraction of the cost of MVM operations. The overhead of ADCs can be mitigated via partial sum quantization. However, prior quantization flows for DNN inference accelerators do not consider partial sum quantization which is not highly relevant to traditional digital architectures. To address this issue, we propose a mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weight quantization, input quantization, and partial sum quantization are jointly applied for each DNN layer. We also propose an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space. Our evaluation shows that the proposed mixed precision quantization scheme and quantization flow reduce inference latency and energy consumption by up to 3.89× and 4.84×, respectively, while only losing 1.18% in DNN inference accuracy.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115053811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency","authors":"Li Yang, Deliang Fan","doi":"10.1145/3394885.3431628","DOIUrl":"https://doi.org/10.1145/3394885.3431628","url":null,"abstract":"To deploy powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN to reduce the network size and computation complexity with negligible accuracy degradation, such as weight quantization, network pruning, convolution decomposition, etc. However, by utilizing conventional DNN compression methods, a smaller, but fixed, network is generated from a relative large background model to achieve resource limited hardware acceleration. However, such optimization lacks the ability to adjust its structure in real-time to adapt for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review our two prior works [13], [15] to tackle this challenge, discussing how to construct a dynamic DNN by means of either uniform or non-uniform sub-nets generation methods. Moreover, to generate multiple non-uniform sub-nets, [15] needs to fully retrain the background model for each sub-net individually, named as multi-path method. To reduce the training cost, in this work, we further propose a single-path sub-nets generation method that can sample multiple sub-nets in different epochs within one training round. The constructed dynamic DNN, consisting of multiple sub-nets, provides the ability to run-time trade-off the inference accuracy and latency according to hardware resources and environment requirements. In the end, we study the the dynamic DNNs with different sub-nets generation methods on both CIFAR-10 and ImageNet dataset. We also present the run-time tuning of accuracy and latency on both GPU and CPU.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115284409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Technology Mapper for Complex Universal Gates","authors":"Meng-Che Wu, A. Dao, Mark Po-Hung Lin","doi":"10.1145/3394885.3431561","DOIUrl":"https://doi.org/10.1145/3394885.3431561","url":null,"abstract":"Complex universal logic gates, which may have higher density and flexibility than basic logic gates and look-up tables (LUT), are useful for cost-effective or security-oriented VLSI design requirements. However, most of the technology mapping algorithms aim to optimize combinational logic with basic standard cells or LUT components. It is desirable to investigate optimal technology mappers for complex universal gates in addition to basic standard cells and LUT components. This paper proposes a novel technology mapper for complex universal gates with a tight integration of the following techniques: Boolean network simulation with permutation classification, supergate library construction, dynamic programming based cut enumeration, Boolean matching with optimal universal cell covering. Experimental results show that the proposed method outperforms the state-of-the-art technology mapper in ABC, in terms of both area and delay.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123639804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Li, Jiangyuan Gu, S. Yin, Leibo Liu, Shaojun Wei
{"title":"Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs","authors":"Cheng Li, Jiangyuan Gu, S. Yin, Leibo Liu, Shaojun Wei","doi":"10.1145/3394885.3431414","DOIUrl":"https://doi.org/10.1145/3394885.3431414","url":null,"abstract":"Coarse-Grained Reconfigurable Architectures (CGRAs) are attractive reconfigurable platforms with the advantages of high performance and power efficiency. In a CGRA based computing system, the computations are often mapped onto the CGRA with parallel memory accesses. To fully exploit the on-chip memory bandwidth, memory partitioning algorithms are widely used to reduce access conflicts. CGRAs have a fixed storage fabric and limited size memory due to the severe area constraints. Previous memory partitioning algorithms assumed that data could be completely transferred into the target memory. However, in practice, we often encounter situations where on-chip storage is insufficient to store the complete data. In order to perform the computation of these applications in the memory-limited CGRA, we first develop a memory partitioning strategy with continual placement, which can also avoid data preprocessing, and then divide the kernel into multiple subtasks that suit the size of the target memory. Experimental results show that, compared to the state-of-the-art method, our approach achieves a 43.2% reduction in data preparation time and an 18.5% improvement in overall performance. If the subtask generation scheme is adopted, our approach can achieve a 14.4% overall performance improvement while reducing memory requirements by 99.7%.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123648399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Standard Cell Routing with Reinforcement Learning and Genetic Algorithm in Advanced Technology Nodes","authors":"Haoxing Ren, Matthew R. Fojtik","doi":"10.1145/3394885.3431569","DOIUrl":"https://doi.org/10.1145/3394885.3431569","url":null,"abstract":"Standard cell layout in advanced technology nodes are done manually in the industry today. Automating standard cell layout process, in particular the routing step, are challenging because of the constraints of enormous design rules. In this paper we propose a machine learning based approach that applies genetic algorithm to create initial routing candidates and uses reinforcement learning (RL) to fix the design rule violations incrementally. A design rule checker feedbacks the violations to the RL agent and the agent learns how to fix them based on the data. This approach is also applicable to future technology nodes with unseen design rules. We demonstrate the effectiveness of this approach on a number of standard cells. We have shown that it can route a cell which is deemed unroutable manually, reducing the cell size by 11%.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121050297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hierarchical Assessment Strategy on Soft Error Propagation in Deep Learning Controller","authors":"Ting Liu, Yuzhuo Fu, Yan Zhang, Bin Shi","doi":"10.1145/3394885.3431573","DOIUrl":"https://doi.org/10.1145/3394885.3431573","url":null,"abstract":"Deep learning techniques have been introduced into the field of intelligent controller design in recent years and become an effective alternative in complex control scenarios. In addition to improve control robustness, deep learning controllers (DLCs) also provide a potential fault tolerance to internal disturbances (such as soft errors) due to the inherent redundant structure of deep neural networks (DNNs). In this paper, we propose a hierarchical assessment to characterize the impact of soft errors on the dependability of a PID controller and its DLC alternative. Single-bit-flip injections in underlying hardware and time series data collection from multiple abstraction layers (ALs) are performed on a virtual prototype system based on an ARM Cortex-A9 CPU, with a PID controller and corresponding recurrent neural network (RNN) implemented DLC deployed on it. We employ generative adversarial networks and Bayesian networks to characterize the local and global dependencies caused by soft errors across the system. By analyzing cross-AL fault propagation paths and component sensitivities, we discover that the parallel data processing pipelines and regular feature size scaling mechanism in DLC can effectively prevent critical failure causing faults from propagating to the control output.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125802615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hsuan Hsiao, Joshua San Miguel, Yuko Hara-Azumi, J. Anderson
{"title":"Zero Correlation Error: A Metric for Finite-Length Bitstream Independence in Stochastic Computing","authors":"Hsuan Hsiao, Joshua San Miguel, Yuko Hara-Azumi, J. Anderson","doi":"10.1145/3394885.3431552","DOIUrl":"https://doi.org/10.1145/3394885.3431552","url":null,"abstract":"Stochastic computing (SC), with its probabilistic data representation format, has sparked renewed interest due to its ability to use very simple circuits to implement complex operations. Though unlike traditional binary computing, SC needs to carefully handle correlations that exist across data values to avoid the risk of unacceptably inaccurate results. With many SC circuits designed to operate under the assumption that input values are independent, it is important to provide the ability to accurately measure and characterize independence of SC bitstreams. We propose zero correlation error (ZCE), a metric that quantifies how independent two finite-length bitstreams are, and show that it addresses fundamental limitations in metrics currently used by the SC community. Through evaluation at both the functional unit level and application level, we demonstrate how ZCE can be an effective tool for analyzing SC bitstreams, simulating circuits and design space exploration.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126015634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcel Walter, Winston Haaswijk, R. Wille, F. Sill, R. Drechsler
{"title":"One-pass Synthesis for Field-coupled Nanocomputing Technologies","authors":"Marcel Walter, Winston Haaswijk, R. Wille, F. Sill, R. Drechsler","doi":"10.1145/3394885.3431607","DOIUrl":"https://doi.org/10.1145/3394885.3431607","url":null,"abstract":"Field-coupled Nanocomputing (FCN) is a class of post-CMOS emerging technologies, which promises to overcome certain physical limitations of conventional solutions such as CMOS by allowing for high computational throughput with low power dissipation. Despite their promises, the design of corresponding FCN circuits is still in its infancy. In fact, state-of-the-art solutions still heavily rely on conventional synthesis approaches that do not take the tight physical constraints of FCN circuits (particularly with respect to routability and clocking) into account. Instead, physical design is conducted in a second step in which a classical logic network is mapped onto an FCN layout. Using this two-stage approach with a classical and FCN-oblivious logic network as an intermediate result, frequently leads to substantial quality loss or completely impractical results. In this work, we propose a one-pass synthesis scheme for FCN circuits, which conducts both steps, synthesis and physical design, in a single run. For the first time, this allows to generate exact, i. e., minimal FCN circuits for a given functionality.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129052599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}