{"title":"High-Accuracy FIR Filter Design Using Stochastic Computing","authors":"Bo Yuan, Yanzhi Wang","doi":"10.1109/ISVLSI.2016.63","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.63","url":null,"abstract":"Finite impulse response (FIR) filter is the basic functional component in various signal processing and communication systems. In many practical applications that have stringent requirement on spectrum, long FIR filters are needed to achieve the desired filtering performance. However, because a T-tap FIR filter requires T copies of high-complexity multiplier, the conventional design of long FIR filter consumes a large amount of silicon area and power dissipation. This paper, for the first time, proposes a high-accuracy stochastic computing (SC)-based FIR filter design. By utilizing the simplicity of stochastic arithmetic unit, the proposed stochastic FIR filter achieves significant reduction in hardware complexity as compared to the conventional design. More importantly, this paper proposes a new high-accuracy non-scaled stochastic adder that has significant increase in computation accuracy than the conventional stochastic adder. Built on this new stochastic adder, the proposed stochastic FIR filter achieves much higher accuracy than the existing stochastic FIR filter design, especially for large T cases, thereby unlocking the potentiality for the widespread applications of stochastic FIR filters in practical signal processing systems.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122149528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Integrated Common Gate CTLE Receiver Front End with Charge Mode Adaptation","authors":"Divya Duvvuri, V. Pasupureddi","doi":"10.1109/ISVLSI.2016.105","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.105","url":null,"abstract":"This work presents a first common gate continuous time linear equalizer (CG-CTLE) with charge mode adaptation in 1.1 V, 65 nm CMOS technology. The proposed equalizer is realized with a common gate topology and offers an input impedance of 50 Ω. It also acts as a first stage of current mode receiver and is made adaptive to varying channel loss. Therefore, the need for an external termination to avoid reflections and the need for a trans-impedance amplifier as receiver's first stage is eliminated. The proposed CG-CTLE is compared with conventional common source (CS) CTLE and it outperforms CSCTLE in terms of bandwidth and bit error rate (BER) for the same targeted output signal swing and power consumption. The post layout performance results show that it offers an input impedance of 44.6 Ω, input referred noise of 19.6 √pA/Hz, BER = 10-13 and consumes 13.9 mW power while operating at a data rate of 15 Gbps over a 7.5 inch FR4 PCB trace.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129073478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-Placement Optimization for Thermal-Induced Mechanical Stress Reduction","authors":"Tiantao Lu, Zhiyuan Yang, Ankur Srivastava","doi":"10.1109/ISVLSI.2016.69","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.69","url":null,"abstract":"This paper presents a post-placement technique for through-silicon-via (TSV) induced thermal mechanical stress reduction. Thermal mechanical stress causes several critical failures such as material fracture (interfacial delamination and silicon substrate cracking) and TSV stress migration (SM). The von Mises stress is used as a material fracture metric. An analytical TSV SM model is used, which replaces time-consuming finite-element-method (FEM) based simulation. The von Mises stress criterion and the analytical SM model are combined to form a unified placement optimization problem to alleviate both material fracture and SM problems. Considering the TSV-induced thermal mechanical stress profile strongly depends on TSV placement and thermal profile, iterative optimizations are performed to optimize the placement of TSVs and power-dissipating gates. Results show that compared to an initial reliability-unaware 3D placement, our design achieves 2.44x longer SM mean-time-to-failure (MTTF), 23% reduction in von Mises stress, with only 3% wirelength overhead.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130214496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI Architecture for Cyclostationary Feature Detection Based Spectrum Sensing for Cognitive-Radio Wireless Networks and Its ASIC Implementation","authors":"M. S. Murty, R. Shrestha","doi":"10.1109/ISVLSI.2016.12","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.12","url":null,"abstract":"Cyclostationary feature detection for spectrum sensing in cognitive radio network has significant prospect in future wireless communication systems. This work deals with the very-large scale integration(VLSI) architectural transformation of such detection algorithm for field-programmable gate-array (FPGA) prototyping and application-specific integrated-circuit (ASIC) design. System level design of this detection algorithm and the architectures of all its internal blocks has been proposed in this paper. Subsequently, performance analysis ofthe suggested detector in additive-white Gaussian-noise (AWGN) environmenthas been carried out where it could deliver 0.95 probability ofdetection at -6 dB. Similarly, performance comparison of the implementedand simulated detector showed that there is a absolute error of only 0.07. Eventually, the proposed system-level architecture is synthesized and postlayoutsimulated using 90 nm complementary metal-oxide semiconductor(CMOS) technology node. It occupies 23.13 mm2 of core area with 3663Kgate-equivalents and consumes total power of 6.5 W at 100 MHz clock frequency.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129409605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Song Yao, Song Han, Yu Wang, Huazhong Yang
{"title":"Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware","authors":"Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Song Yao, Song Han, Yu Wang, Huazhong Yang","doi":"10.1109/ISVLSI.2016.129","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.129","url":null,"abstract":"Convolutional Neural Network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many applications. However, for embedded platforms, CNN-based solutions are still too complex to be applied if only CPU is utilized for computation. Various dedicated hardware designs on FPGA and ASIC have been carried out to accelerate CNN, while few of them explore the whole design flow for both fast deployment and high power efficiency. In this paper, we propose Angel-Eye, a programmable and flexible CNN processor architecture, together with compilation tool and runtime environment. Evaluated on Zynq XC7Z045 platform, Angel-Eye is 8× faster and 7× better in power efficiency than peer FPGA implementation on the same platform. A demo of face detection on XC7Z020 is also 20× and 15× more energy efficient than counterparts on mobile CPU and mobile GPU respectively.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"35 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116655323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Netto, Vinicius S. Livramento, C. Guth, L. Santos, José Luís Almada Güntzel
{"title":"Speeding up Incremental Legalization with Fast Queries to Multidimensional Trees","authors":"R. Netto, Vinicius S. Livramento, C. Guth, L. Santos, José Luís Almada Güntzel","doi":"10.1109/ISVLSI.2016.122","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.122","url":null,"abstract":"Circuit legalization removes overlaps and keeps cell alignment with power rails while minimizing total cell displacement. Legalization is applied not only after global placement, but also after incremental optimization steps like detailed placement, gate sizing, and buffer insertion. Applying full legalization after such incremental optimizations is too time-consuming. That is why physical synthesis has been shifting from entire circuit legalization to incremental mode legalization, which keeps legality after every primitive transformation. Unfortunately, recent incremental legalization strategies employ data structures that are not suitable for handling geometric data. This work proposes a new technique that relies on an R-tree, a data structure tailored to efficient geometric data storage where objects are represented by their minimum bounding box rectangles, which allows for fast spatial queries. As compared with state-of-the-art incremental legalization algorithms, the proposed technique is at least 6 times faster and performs as many successful legalizations.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133830609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design Space Exploration of FinFETs with Double Fin Heights for Standard Cell Library","authors":"Chi-Hung Lin, Chia-Shiang Chen, Yu-He Chang, Yu Ting Zhang, Shang-Rong Fang, Santanu Santra, Rung-Bin Lin","doi":"10.1109/ISVLSI.2016.72","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.72","url":null,"abstract":"This paper proposes a method to explore the design space of FinFETs with double fin heights. Our study shows that if one fin height is sufficiently larger than the other and the greatest common divisor of their equivalent transistor widths is small, the fin height pair will incur less width quantization effect and lead to better area efficiency. We design a standard cell library based on this technology using a tailored FreePDK15. With respect to a standard cell library designed with FreePDK15, about 86% of the cells designed with FinFETs of double fin heights have a smaller delay and 54% of the cells take a smaller area. We also demonstrate the advantages of FinFETs with double fin heights through chip designs using our cell library.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116454096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Reinbrecht, A. Susin, L. Bossuet, Martha Johanna Sepúlveda
{"title":"Gossip NoC -- Avoiding Timing Side-Channel Attacks through Traffic Management","authors":"C. Reinbrecht, A. Susin, L. Bossuet, Martha Johanna Sepúlveda","doi":"10.1109/ISVLSI.2016.25","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.25","url":null,"abstract":"The wide use of Multi-processing systems-on-chip (MPSoCs) in embedded systems and the trend to increase the integration between devices have turned these systems vulnerable to attacks. Malicious software executed on compromised IP may become a serious security problem. By snooping the traffic exchanged through the Network-on-chip (NoC), it is possible to infer sensitive information such as secrets keys. NoCs are vulnerable to side channel attacks that exploit traffic interference as timing channels. When multiple IP cores are infected, they can work coordinately to implement a distributed timing attack (DTA). In this work we present for the first time the execution of a DTA and a secure enhanced NoC architecture able to avoid the timing attacks. Results show that our NoC proposal can avoid the DTA with an increase of only 1% in area and 0.8% in power regarding the whole chip design.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"164 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132818940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Low-Density Parity-Check (LDPC) Code Decoding for Combating Asymmetric Errors in STT-RAM","authors":"Bohua Li, Yukui Pei, Wujie Wen","doi":"10.1109/ISVLSI.2016.9","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.9","url":null,"abstract":"Spin-transfer torque random access memory (STT-RAM) has emerged as a promising nonvolatile memory technology for its fast speed, small footprint and zero standby power. However, the unique and unusual high asymmetric error rates at different memory bit operations, which are proved to be far beyond the efficiency of common error correction codes (ECCs), greatly hinder its applications. In this work, we investigate the potentials of the powerful low-density parity-check (LDPC) code to address the aggravated reliability issue in STT-RAM. We first develop a holistic STT-RAM channel model to quantitatively measure the asymmetric effects during the write and read process for single-level-cell (SLC) and multi-level-cell (MLC) design. We then propose an asymmetric LDPC (A-LDPC) decoding to particularly enhance the asymmetric error correcting capability. An STT-RAM dedicated hardware-favorable soft information, namely asymmetric Log-Likelihood Ratio (A-LLR), is also derived from the proposed channel model. Experimental results show that our A-LDPC can outperform at least two/four orders of magnitude over existing ECCs for combating the asymmetric bit errors in SLC/MLC STT-RAM.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memristor-Based Discrete Fourier Transform for Improving Performance and Energy Efficiency","authors":"R. Cai, Ao Ren, Yanzhi Wang, Bo Yuan","doi":"10.1109/ISVLSI.2016.124","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.124","url":null,"abstract":"Memristor has emerged as one of the most promising candidates for the fundamental device in the beyond-CMOS era. With their unique advantage on implementing low-power high-speed matrix multiplication, memristors have shown great and vast potentiality in many specific applications. This paper, for the first time, investigates the hardware design of DFT using memristors. Two implementations of DFT using memristors have been presented for effectively trading-off between hardware complexity and computing speed. Simulation results show that as compared to the conventional CMOS-based design, the proposed memristor-based design enables significant reduction in computation latency and improvement in power efficiency with very low inaccuracy. Simulation results show that the proposed memristor-based implementation could reach up to 10X improvement in speed and 109.8X reduction in power efficiency compared to CMOS-based design.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116003972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}