{"title":"A Low-Power Recurrence-Based Radix 4 Divider Using Signed-Digit Addition","authors":"Matthew Gaalswyk, James W. Stine","doi":"10.1109/ISVLSI.2019.00077","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00077","url":null,"abstract":"This paper presents a novel radix-4 division by recurrence architecture that utilizes a hierarchical Signed-Digit (SD) adder. The implementations are easily generated based on the methodology as it is suited towards digital implementations. Results are generated for several designs using Global Foundries 45nm SOI technology and ARM standard cells. Results indicate that power dissipation can be reduced using these architectures for division by recurrence as the area is significantly decreased.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"86 1","pages":"391-396"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83977372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate Energy Recovery 4-2 Compressor for Low-Power Sub-GHz IoT Applications","authors":"H. Thapliyal, Zachary Kahleifeh","doi":"10.1109/ISVLSI.2019.00081","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00081","url":null,"abstract":"Approximate computing is a circuit design technique that reduces area and power dissipation at the cost of accurate results. In this paper, we have investigated to further reduce the power dissipation of approximate circuits while maintaining high speeds using a form of energy recovery (ER) computing known as Pulse Boost Logic (PBL). To demonstrate power savings and speed capabilities, we have constructed an approximate 4-2 compressor circuit using PBL based ER computing. Simulations were performed using 45nm technology in Cadence Spectre. At 800 MHz, our results show the average power saving of 64% in PBL based approximate 4-2 compressor design compared to its standard CMOS based design. We also illustrate that the power saving of 89% can be achieved in 4-2 compressor by combining approximate and ER computing compared to CMOS based design of accurate 4-2 compressor. Further, we illustrate that the PBL based proposed approximate 4-2 compressor has 65% less energy consumption than the CMOS based approximate 4-2 compressor. We have verified the functionality of the proposed PBL based approximate 4-2 compressor up to 1 GHz to illustrate its application in low-power and low-energy Sub-GHz IoT applications.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"414-418"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87719016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Routing Performance Optimization for Homogeneous Droplets on MEDA-based Digital Microfluidic Biochips","authors":"Sarit Chakraborty, Susanta Chakraborty","doi":"10.1109/ISVLSI.2019.00082","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00082","url":null,"abstract":"Digital Microfluidic based Biochips (DMFBs) are capable of automation, re-configurable, low operational cost and accuracy of results. Such Lab-on-Chips (Loc's) are now extensively used in point of care diagnosis and other monitoring applications. Routing of micro or nano (10^-6 or 10^-9) litre volume of droplets on such chips elevate few critical challenges due to the blockages caused by microfluidic modules present on the chip. Micro-Electrode Dot Array (MEDA) based architecture of DMFB can facilitate cross contamination free routing and eradicate other routing issues over conventional DMF chips. This paper proposes a novel heuristic routing technique for MEDA based DMFB architecture to tackle routing complexities due to overlapping nets, interfering blockages and deadlock zones formed by the conflicting nets. We have categorized various region based movements of droplet on MEDA chip and derived a metric named Snooping Index (SIn) to improve the routing performance of the droplets in first phase. Next an exhaustive search is applied to find the routing path for the remaining nets considering different constraints specific to MEDA platform. Finally we have computed another measure called 'Zone Compaction Factor' (ZCF) to overcome blockage extensive route paths. Experimental results on benchmark suite I and III show our proposed technique significantly reduces latest arrival time, average assay execution time and number of used cells as compared with earlier methods.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"33 1","pages":"419-424"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87353695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Area Effective Programmable Front-end Amplifier for Neural Signal Acquisition","authors":"Gopabandhu Hota, Hardik Agrawal, M. Sharad","doi":"10.1109/ISVLSI.2019.00046","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00046","url":null,"abstract":"Acquisition and analysis of neural signals have greatly changed our understanding of the brain. These neural implants are required to be as small as possible so that they are least invasive to normal body functioning. The neural signal contains frequency components from 0.1-10KHz and amplitude in 10-100µV range, which is very small and can be easily distorted by external noise sources. This demands a very area-efficient and low-noise Front-End Amplifier (FEA). Low voltage supply and low power dissipation is another critical requirement to ensure safe implantation and prolonged battery life. Keeping all these requirements in mind, we propose a programmable area efficient and low-noise FEA design along with both manual and SAR-based Gain Tuning and Offset Cancellation Scheme which is robust to any temperature and process variations. The designed FEA occupies a minimal area of 0.05 mm2 which shows great area efficiency w.r.t. switch-capacitor based and closed-loop frontend amplifiers. Obtained maximum voltage gain from Simulation is 87.6 dB, Input-referred noise density is 20 nV/√Hz, and the power consumption is 43.2µW at 1.8V power supply with a Noise Efficiency(NEF) factor of 1.84. The proposed scheme has offset cancellation capacity up to 30 mV using the 7 bits of transistor bank.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"38 1","pages":"207-211"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81503513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasuhiro Takahashi, Hiroki Koyasu, S. D. Kumar, H. Thapliyal
{"title":"Post-Layout Simulation of Quasi-Adiabatic Logic Based Physical Unclonable Function","authors":"Yasuhiro Takahashi, Hiroki Koyasu, S. D. Kumar, H. Thapliyal","doi":"10.1109/ISVLSI.2019.00086","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00086","url":null,"abstract":"Silicon based Physical Unclonable Function (PUF) is a popular hardware security primitive for mitigating security vulnerabilities. Recently, Quasi-adiabatic logic based physical unclonable function (QUALPUF) was first proposed by Kumar and Thapliyal. QUALPUF has ultra low-power dissipation; hence it is suitable to implement in low-power portable electronic devices such RFIDs, wireless sensor nodes, etc. In this paper, we present the post-layout simulation results of the 4-bit QUALPUF for low-power portable electronic devices. To evaluate the uniqueness and reliability, the 4-bit QUALPUF is implemented in 0.18 um standard CMOS process with 1.8 V supply voltage. The QUALPUF occupies 58.7x15.7 um2 of layout area. The post-layout simulation results illustrate that the 4-bit QUALPUF has good uniqueness and reliability with 29.73 fJ/cycle/bit energy consumption.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"12 1","pages":"443-446"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80235908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changlu Liu, T. Lan, Qin Li, Kaige Jia, Yidian Fan, Xing Wu, F. Qiao, W. Qi, Xinjun Liu, Huazhong Yang
{"title":"Energy-efficient Analog Processing Architecture for Direction of Arrival with Microphone Array","authors":"Changlu Liu, T. Lan, Qin Li, Kaige Jia, Yidian Fan, Xing Wu, F. Qiao, W. Qi, Xinjun Liu, Huazhong Yang","doi":"10.1109/ISVLSI.2019.00097","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00097","url":null,"abstract":"Direction of arrival (DOA) is a critical component in the conventional smart acoustic system for navigation, noise canceling hearing aids and so on. However, conventional DOA has encountered power consumption and processing speed bottlenecks dominated by analog-to-digital converter (ADC) and fast fourier transform (FFT). Especially in the always-on applications, the power-hungry ADC and time-consuming FFT take up most of the system's computation cost. We propose a novel processing architecture with analog-domain processing for DOA. The whole processing procedure of DOA is implemented in the analog domain without ADC and frequency-domain transformation. In order to verify the performance of the architecture, we simulate a generic DOA algorithm. Under the CMOS 0.18µm process, the results show the 94.5% reduction in power consumption and 4724× improvement in processing speed compared to conventional digital realization. We simulate the simple task with the direction accuracy of 80.74%, which can be extended to a more complex scenario.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"105 1","pages":"507-512"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87481938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Automatic Music Transcription (AMT) with Zync FPGA","authors":"Kevin Vaca, Archit Gajjar, Xiaokun Yang","doi":"10.1109/ISVLSI.2019.00075","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00075","url":null,"abstract":"A real-time automatic music transcription (AMT) system has a great potential for applications and interactions between people and music, such as the popular devices Amazon Echo and Google Home. This paper thus presents a design on chord recognition with the Zync7000 Field-Programmable Gate Array (FPGA), capable of sampling analog frequency signals through a microphone and, in real time, showing sheet music on a smart phone app that corresponds to the user's playing. We demonstrate the design of audio sampling on programming logic and the implementation of frequency transform and vector building on programming system, which is an embedded ARM core on the Zync FPGA. Experimental results show that the logic design spends 574 slices of look-up-tables (LUTs) and 792 slices of flip-flops. Due to the dynamic power consumption on programming system (1399 mW) being significantly higher than the dynamic power dissipation on programming logic (7 mW), the future work of this platform is to design intelligent property (IP) for algorithms of frequency transform, pitch class profile (PCP), and pattern matching with hardware description language (HDL), making the entire system-on-chip (SoC) able to be taped out as an application-specific design for consumer electronics.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"141 1","pages":"378-384"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80139863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Cai, Xiaolong Ma, O. Chen, Ao Ren, Ning Liu, N. Yoshikawa, Yanzhi Wang
{"title":"IDE Development, Logic Synthesis and Buffer/Splitter Insertion Framework for Adiabatic Quantum-Flux-Parametron Superconducting Circuits","authors":"R. Cai, Xiaolong Ma, O. Chen, Ao Ren, Ning Liu, N. Yoshikawa, Yanzhi Wang","doi":"10.1109/ISVLSI.2019.00042","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00042","url":null,"abstract":"Josephson Junction (JJ) based superconductor logic families have been proposed and implemented to process analog and digital signals [1] for its low energy dissipation and ultrafast switching speed. Thanks to its construction of resistance-less wires and ultrafast switches, it can operate at clock frequencies of several tens of gigahertz and even hundreds of thousands of times as energy efficient as its CMOS counterparts. It has been perceived to be an important candidate to replace stateof-the-art CMOS due to the superior potential in operation speed and energy efficiency, as recognized by the U.S. IARPA C3 and SuperTools Programs and Japan MEXT-JSPS Project. The design and fabrication of superconducting circuits have already been established [2]-[4]. In addition, a prototype superconducting microprocessor \"Core 1\" has been demonstrated in 2004 [3], which is able to execute instructions at a high clock frequency of several tens of gigahertz, and with extremely low-power dissipation. These achievements make superconducting electronics highly promising for future high-performance computing applications. As one of the most matured superconducting technology, the Rapid-Single-Flux-Quantum (RSFQ) technology is proposed by K. Likharev, O. Mukhanoc, V. Semenov in 1985 [1]. Despite its capability to be operated at an ultra-high speed of hundreds of GHz while maintaining extremely low switching energy (10^-19 J), it suffers from an increasing static power due to on-chip resistors that are required for constant DC bias supply for the main RSFQ circuit. Numerous methods have been proposed to resolve the static power dissipation problem of RSFQ, including low-voltage RSFQ (LV-RSFQ) [5], reciprocal quantum logic (RQL) [6], LRbiased RSFQ [7] and energy-efficient single-flux quantum (eSFQ) [8]. The Adiabatic Quantum-Flux-Parametron (AQFP) technology, on the other hand, uses AC bias/excitation currents as both multiphase clock signal and power supply [9] to mitigate the power consumption overhead of DC bias while operating at a frequency of few GHz. Consequently, AQFP is remarkably energy efficient compared to RSFQ, albeit operating at a lower frequency. The energy-delay-product (EDP) of the AQFP circuits fabricated using processes such as the AIST standard process 2 (STP2) and the MIT-LL SFQ process [10], [11], is at least 200 times smaller than those of the other energy-efficient superconductor logics and is only three orders of magnitude larger than the quantum limit [9]. Physical testing results of an AQFP 8-bit carry-look-ahead adder and large scale circuits consisting up-to 10,000 AQFP logic gates have demonstrated the AQFP being a promising technology that is robust against circuit parameter variations [12]. Despite the high application potential of AQFP in VLSI circuits, a systematic, automatic synthesis framework for AQFP is imminent. There are two features of AQFP that restrict conventional CMOS synthesis methods being directly applied on AQFP. In spi","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"92 1","pages":"187-192"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74945988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for the Analysis of Throughput-Constraints of SNNs on Neuromorphic Hardware","authors":"Adarsha Balaji, Anup Das","doi":"10.1109/ISVLSI.2019.00043","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00043","url":null,"abstract":"Spiking neural networks (SNN) are efficient computation models to infer spacio-temporal pattern recognition applications on neuromorphic hardware. Neuromorphic hardware are typically designed using interconnected crossbars, with each crossbar containing a structure of fully connected neurons. In order to ensure application performance such as accuracy and system performance such as throughput and resource utilization, SNNs need to be efficiently mapped on neuromorphic hardware. To address this, we propose a design flow to partition and map SNN-based applications on neuromorphic hardware, with an aim to enhance application and system performance. The design flow operates in two steps : (1) a two-step clustering technique to partition trained SNNs into clusters of neurons and synapses, with an aim to minimize inter-cluster spike communication, (2) mapping and scheduling the clusters on to crossbars-based architectures, modeled using Synchronous Data-flow Graphs (SDFGs). The SDFG model incorporates hardware constraints such as I/O bandwidth of crossbars and synaptic memory while analyzing the throughput of the modeled system. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool to map SDFG on hardware. We evaluate the design-flow using synthetic and realistic SNN-based applications. We show that, for throughput constrained applications, we achieve a 21.74% and 15.03% reduction in memory usage and utilization of the time-multiplexed interconnect, compared to a state of the art approach.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"193-196"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75049569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruben Vazquez, Islam Badreldin, Mohamad Hammam Alsafrjalani, A. Gordon-Ross
{"title":"Machine Learning-based Prediction for Phase-Based Dynamic Architectural Specialization","authors":"Ruben Vazquez, Islam Badreldin, Mohamad Hammam Alsafrjalani, A. Gordon-Ross","doi":"10.1109/ISVLSI.2019.00101","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00101","url":null,"abstract":"Embedded computing systems are becoming increasingly complex, now performing tasks that were generally limited to desktop computing systems. However, embedded system designers are still required to adhere to stringent embedded design constraints (e.g., energy and area requirements) when designing such increasingly complex systems. To meet these constraints, configurable hardware components introduce configurable parameters (e.g., CPU voltage and frequency, cache size, cache associativity, cache line size, pipeline depth/width, etc.) that can be tuned to specific values to meet different design constraints (e.g., area, energy, performance, etc.) and user demands (e.g., increased battery life, increased performance, or a desired trade off), which translates to a better quality of the user experience. However, determining these specific parameter values is increasingly difficult and time-consuming as the configurable parameter design space increases. This issue is further complicated when considering that each application has a different set of optimal/best parameter values based on these demands and requirements. Furthermore, repetitious application behavior, known as phases, which occur throughout an application's runtime, can be exploited by tracking each phase's unique optimal parameter values; resulting in a multiplicative increase or an exponential increase in the size of the size of the configuration space. In this paper, we propose a machine learning-based methodology to significantly reduce the time required to find the optimal configurable parameter values for the instruction and data caches for each application phase. In our method, we use artificial neural networks (ANNs) to predict the optimal configuration for application phases. We collect execution statistics for use as features for an application phase and use feature reduction to significantly reduce the features size. We show that ANNs exhibit high, stable accuracy over multiple training and testing iterations. We also show that applications exhibit low energy degradations (less than 1%) for both the instruction and data caches using our methodology.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"529-534"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91506287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}