{"title":"Efficient Hardware Implementation of Artificial Neural Networks Using Approximate Multiply-Accumulate Blocks","authors":"Mohammadreza Esmali Nojehdeh, L. Aksoy, M. Altun","doi":"10.1109/isvlsi49217.2020.00027","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00027","url":null,"abstract":"In this paper, we explore efficient hardware implementation of feedforward artificial neural networks (ANNs) using approximate adders and multipliers. We also introduce an approximate multiplier with a simple structure leading to a considerable reduction in the ANN hardware complexity. Due to a large area requirement in a parallel architecture, the ANNs are implemented under the time-multiplexed architecture where computing resources are re-used in the multiply-accumulate (MAC) blocks. The efficient hardware implementation of ANNs is realized by replacing the exact adders and multipliers in the MAC blocks by the approximate ones taking into account the hardware accuracy. Experimental results show that the ANNs designed using the proposed approximate multiplier have smaller area and consume less energy than those designed using previously proposed prominent approximate multipliers. It is also observed that the use of both approximate adders and multipliers yields respectively up to a 64% and 43% reduction in energy consumption and area of the ANN design with a slight decrease in the hardware accuracy when compared to the exact adders and multipliers.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128257093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Level Modeling of Memristive Crossbar Arrays","authors":"Md. Adnan Zaman, Rajeev Joshi, S. Katkoori","doi":"10.1109/isvlsi49217.2020.000-3","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.000-3","url":null,"abstract":"Crossbar architecture is one of the prominent candidates to enable memristor based in-memory computing. Recent literature suggests that predominantly SPICE level simulations have been performed to check the correctness of the memristive systems. Though SPICE simulation gives accurate results, it takes a substantial amount of time as circuit complexity increases. Currently, memristor mapping tools (such as SIMPLER MAGIC) are not guaranteed to generate a correct design by construction as they do not provide any formal proof for their corresponding tools. The aforementioned reasons motivate us to come up with a behavioral model of the memristive system. We use two processes to model the memristor-one to decide the final signal value when multiple sources drive it. Another process decides the final states of the memristors. The proposed model along with the control voltage sequence and initial states of memristors allows us to quickly verify the functionality of the memristive system using VHDL based simulation. While several SPICE level models are available, to the best of our knowledge, this is the first work that proposes a behavioral VHDL model of memristor. To validate our proposed approach, we compare our model with a SPICE based model in terms of functional correctness and runtime speedups, experimental evaluation on thirteen (13) different combinational benchmark circuits resulted in runtime speedups of 140X on average with 8X-205X range.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131346433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liuting Shang, Muhammad Adil, Ramtin Madani, C. Pan
{"title":"Fast Linear Programming Optimization Using Crossbar-Based Analog Accelerator","authors":"Liuting Shang, Muhammad Adil, Ramtin Madani, C. Pan","doi":"10.1109/isvlsi49217.2020.00057","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00057","url":null,"abstract":"Linear programming optimization is critical to logistics management, engineering designs, and decision making in every area of the economy. Traditional hardware that using GPU and CPU platforms for this purpose is significantly limited by the scaling transistor size. In this paper, an analog in-memory computation circuit is proposed to accelerate linear programming optimization problems. The proposed scheme includes a memristor crossbar array and analogue peripheral circuits that do not need ADC/DAC between each iteration of the algorithm. In addition, we discuss several key parameters related to interconnect parasitics and non-ideal device characteristics to provide practical guidelines. Furthermore, we propose three design schemes to mitigate the computation error that comes from the interconnect resistance in a large-scale crossbar array implementation. Optimal design parameters are quantitatively analyzed under a given number of memristance and array size. It is demonstrated that the proposed accelerator achieves energy consumption, area and delay reductions of ~ 21×, ~151× and ~ 33×, respectively, compared to the 16nm-technology CMOS digital circuits for a 1000×1000 array with a precision of 6-bit","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"110 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113961771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 2^7 -1 Low-Power Half-Rate 16-Gb/s Charge-Mode PRBS Generator in 1.2V, 65nm CMOS","authors":"Prema Kumar Govindaswamy, V. Pasupureddi","doi":"10.1109/isvlsi49217.2020.00046","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00046","url":null,"abstract":"In this work, we propose a half-rate 2^7-1 pseudo random bit sequence(PRBS) generator by employing highly power efficient charge-mode circuit topology at 16-Gb/s. At the target data-rate, proposed charge-mode implementation have the lowest power consumption compared to the traditional currentmode PRBS generator implementations, thanks to the availability of high speed switches in sub-100nm technologies. The proposed charge-mode half-rate PRBS generator is implemented in 1.2 V, 65-nm CMOS technology with a power consumption of 3.35 mW, timing jitter of 0.2 ps and FoM of 0.02-pJ/bit at 16-Gb/s. Thus, the proposed power efficient charge-mode implementation of PRBS generator is an attractive candidate for on-chip biterror-rate(BER) test and measurement applications.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133556367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ioannis Galanis, Iraklis Anagnostopoulos, Chinh Nguyen, Guillermo Bares, Dona Burkard
{"title":"Inference and Energy Efficient Design of Deep Neural Networks for Embedded Devices","authors":"Ioannis Galanis, Iraklis Anagnostopoulos, Chinh Nguyen, Guillermo Bares, Dona Burkard","doi":"10.1109/isvlsi49217.2020.00017","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00017","url":null,"abstract":"Deep/Convolutional Neural Networks (DNNs/CNNs) are deployed on resource-constraint embedded devices in order to serve popular computer vision applications. However, DNNs have increased computing requirements and battery-operated devices suffer to deliver acceptable performance. In this paper, we present an efficient design of DNNs for edge devices that performs a DNN architectural search. Our method finds alternative designs of DNNs that have lower energy consumption and inference time than ResNet reference networks. Experimental results show up to 78.82% reduction in energy consumption and 35.71% in inference time, while training up to 95.67% fewer networks. As a trade-off, our approach compromises the user Quality of Service up to 2% compared to the reference networks.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130994167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-grained Reconfigurable Accelerator for Approximate Computing","authors":"Yirong Kan, Man Wu, Renyuan Zhang, Y. Nakashima","doi":"10.1109/isvlsi49217.2020.00026","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00026","url":null,"abstract":"An elastic neural network is implemented by FPGA for constructing the multi-grained reconfigurable accelerator (MGRA). On the basis of a novel bisection neural network (BNN) topology, the entire network on hardware is efficiently partitioned into arbitrary pieces with diamond-like shape (seen as \"DiaNet\") which perform regressions for retrieving arbitrary approximate calculations in parallel. By organizing massive DiaNets, the entire network is reconfigurable in fine-grained (functions of each DiaNet), mid-grained (DiaNet features), and coarse-grained (organization of DiaNets) without redundancy. In this work, a proof-of-concept BNN with 8x8 processing elements (PEs) is implemented by FPGA for performing six calculation units (CU) in parallel. Over various approximate computing tasks with one, two, and three operands, all calculations are retrieved with the inaccuracy less than 3.1%. The maximum hardware utilization of a single CU is reduced to 1.7%, 17.9%, and 7.6% of general arithmetic logic unit (ALU), approximate computing units powered by domain-specific architecture (DSA) and neural network, respectively.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131353326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regulating Degree of Adaptiveness for Performance-Centric NoC Routing","authors":"T. S. Das, Navonil Chatterjee, P. Ghosal","doi":"10.1109/isvlsi49217.2020.00007","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00007","url":null,"abstract":"In the network-on-chip (NoC) communication framework, congestion in priority-fixed shortest routes may result in poor network performances in terms of increasing packet latency, and reduced throughput value. Here, the employment of adaptive routing allows more freedom in selecting an alternate congestion-free route in minimal or non-minimal direction. Though the selection of an output link in non-minimal directions based on local congestion information may also degrade network performance rather than improving due to the increasing number of resource sharer in a longer route. Moreover, packet routing using a longer route may not support guaranteed throughput (GT) intensive real-time applications. In addition, allowing freedom in the non-minimal route increases the chance of occurring deadlock and live-lock cycles. In this work, we follow an adaptive routing approach that relies on reserving a virtual path for routing packet in both minimal and non-minimal direction while satisfying the application demands in meeting the hard deadline of packet arrival time and guaranteed minimum throughput. In the proposed work, we also investigate to figure out a trade-off between given routing flexibility and overall network performances under the presence of various data traffics. Our experimental results reveal that fixing this range in non-minimal direction at run time is more beneficial than always selecting a specific value, as the deflection range varies based on underlying application demands and present network traffic situation.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114715698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D-Sorter: 3D Design of a Resource-Aware Hardware Sorter for Edge Computing Platforms Under Area and Energy Consumption Constraints","authors":"Amin Norollah, Z. Kazemi, D. Hély","doi":"10.1109/isvlsi49217.2020.00018","DOIUrl":"https://doi.org/10.1109/isvlsi49217.2020.00018","url":null,"abstract":"In this paper, we proposed a 3-dimensional hardware sorting architecture (3D-Sorter), based on MultiDimensional Sorting Algorithm (MDSA). the proposed architecture transforms a sequence of input records into a 3-dimensional matrix. Records of every dimension are sorted in several MDSA phases, using partial sorting methods. Our synthesis results, provided by Xilinx Vivado indicate that the 3D-Sorter design decreases the number of Look-Up Tables (LUT) and registers by 54% and 42.7%, compared to the state-of-the-art hardware sorter. Also, the power consumption is reduced by 48.15% on average. The results show that the proposed architecture is a remarkable power/area saving for edge components.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117194288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Margherita Ronchini, M. Zamani, H. Farkhani, F. Moradi
{"title":"Tunable Voltage-Mode Subthreshold CMOS Neuron","authors":"Margherita Ronchini, M. Zamani, H. Farkhani, F. Moradi","doi":"10.1109/ISVLSI49217.2020.00053","DOIUrl":"https://doi.org/10.1109/ISVLSI49217.2020.00053","url":null,"abstract":"To address the ever-increasing computational demands of machine learning applications, neuromorphic computing has emerged as a possible solution. The goal is to design a platform able to mimic the processing strategies of the brain. A neuromorphic system is composed by artificial neurons and synapses implemented in hardware with high level of integration. Such implementations entail challenges including power-efficiency, compactness and biophysical resemblance. This work proposes a new implementation of a neuron circuit, initially introduced by Wijekoon and Dudek. We show that the proposed neuron, designed in a standard 0.18µm CMOS process, consumes 58.5fJ/spike at 0.2V supply voltage. The area covered by the circuit is 16.8% of the area of the state-of-the-art implementation. This result was achieved by lowering the membrane capacitance and the number of transistors. In addition, spiking activity unfolds on a biological time scale - rather than accelerated. The circuit preserves the possibility of being adjusted by external biases to attain different firing patterns.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging 3D Vertical RRAM to Developing Neuromorphic Architecture for Pattern Classification","authors":"Bokyung Kim, H. Li","doi":"10.1109/ISVLSI49217.2020.00054","DOIUrl":"https://doi.org/10.1109/ISVLSI49217.2020.00054","url":null,"abstract":"The crossbar architecture with resistive random-access memory (RRAM) devices presents many advantages in realizing matrix-based computations and achieves success in neural network implementation. However, the rapid growth of network size demands even denser structures. In this paper, we investigate the neuromorphic hardware design based on the three-dimensional vertical RRAM (3D VRRAM) with an even/odd word line (WL) structure. The increased interconnects of VRRAM aggravate the chronic problems of the crossbar structure like the sneak path currents. We address this issue by attaining a balanced structure with high nonlinear RRAM devices. Furthermore, the impact of complicated signal routing and control due to the vertically stacked structure can be alleviated through architectural level optimization. A three-layer VRRAM structure is demonstrated for neuromorphic design by showing that 8X8-pixel images were successfully classified into three alphabet characters on this structure. The example design also verifies that the 3D VRRAM with even/odd WL structure is beneficial to acquire high area efficiency.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"481 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132568325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}