{"title":"Multi-Thread Assembling for Fast FEM Power Delivery DC Integrity Analysis","authors":"Ke Yang, Shaoyi Peng, S. Tan, Hai-Bao Chen","doi":"10.1109/ASICON47005.2019.8983609","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983609","url":null,"abstract":"Power integrity analysis is of great significance in the field of circuit design, especially the design of modern high speed circuit system. For the high performance printed circuit boards (PCBs) and IC design, power delivery network DC integrity checks play an important role. However, the element assembling process in finite element method (FEM) can take significant portion of total computing time. In this paper, a fast finite element assembling method for power network DC integrity checks of PCBs is proposed. We divided the mesh into a serious of bins and elements in different bins could be assembled in parallel. Further more, a dynamic circle shape approximation method is introduced to further control the number of elements due to vias and circular objectives. As a result, the new solver can easily perform progressive trade off between speed and accuracy. Experimental results of two PCB examples on a 3.6-GHz Intel i7 Dual-core CPU show that the proposed multi-thread assembling method can achieve 2X speedup over existing single-thread assembling methods. A dynamic circle shape approximation method is introduced to further control the number of elements and speed up the solver process. The resulting FEM solver leads to 3X speed over a commercial power integrity solver with no more than 0.7% errors.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130742483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xie Xie, Qinghua Duan, Jiafeng Liu, Jian Wang, Jinmei Lai
{"title":"Design and implementation of Serial ATA pbysical layer on FPGA","authors":"Xie Xie, Qinghua Duan, Jiafeng Liu, Jian Wang, Jinmei Lai","doi":"10.1109/ASICON47005.2019.8983634","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983634","url":null,"abstract":"An increasing number of high-performance computing system developed on FPGA devices need access to mass storage devices for storing data, the serial ATA protocol is widely used in the modern computer systems for transferring data between the host and hard disks or solid-state drives. This paper describes the design and implementation of serial ATA physical layer core based on the Xilinx GTX transceiver. With the method of cyclically changing the GTX line rate, the SATA hard disk with different line rate can be automatically identified and linked, realizing backward compatibility. An embedded system has also been developed for validating the functionality of our SATA physical layer core. We test our physical layer core with connecting our core to both SATA3 and SATA2 hard disks. The experimental result has indicated our core can not only provide the whole functionality required by the SATA physical layer, but also utilize very few logic resources on FPGA.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132383003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deploying and Optimizing Convolutional Neural Networks on Heterogeneous Architecture","authors":"Junning Jiang, Liang Cai, Feng Dong, Kehua Yu, Ke Chen, Wei Qu, Jianfei Jiang","doi":"10.1109/ASICON47005.2019.8983456","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983456","url":null,"abstract":"Deploying convolutional neural networks to hardware platform can accelerate the inference and is critical for the application of artificial intelligence. In this paper, we design an FPGA+CPU heterogeneous platform to accelerate CNNs. Dataflow optimizing, accelerator structure optimization and compute precision optimization are proposed to improve performance of the accelerating platform. Different ResNet and MobileNet networks are successfully deployed on the platform. By applying the proposed dataflow optimization and precision optimization, the performance improvement of inference is 3.25× on ResNet. By applying the accelerator structure optimization and precision optimization, the performance improvement of inference is 3.63× on MobileNet.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125348843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A digitalized RRAM-based Spiking Neuron Network system with 3-bit weight and unsupervised online learning scheme","authors":"Danqing Wu, Shilin Yan, Haodi Tang, Yu Wang, Jiayun Feng, Xianwu Hu, Jiaxin Cao, Yufeng Xie","doi":"10.1109/asicon47005.2019.8983603","DOIUrl":"https://doi.org/10.1109/asicon47005.2019.8983603","url":null,"abstract":"Resistive-switching Random Access Memory (RRAM) has emerged as a promising candidate for the artificial synaptic in neuromorphic computation circuits due to its similar electronic characteristics with the synaptic and features such as high integration density, non-volatile retention and supporting matrix-vector multiplication. In this paper, a digitalized RRAM-based fully-connected Spiking Neuron Network (SNN) system with 3-bit weight and unsupervised online learning scheme is proposed. It consists of 64 pre-neurons and 10 post-neurons, all the neurons are realized by digital circuits for low area overhead, low power consumption and high accuracy. An unsupervised online learning scheme based on binary STDP protocol is applied to train the synaptic weights. Experiments show that the system can be used to recognize the learned ten handwritten digits efficiently.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"34 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126662255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyuan Qu, Zhihong Huang, Ning Mao, Yu Xu, Gang Cai, Zhen Fang
{"title":"A Grain-Adaptive Computing Structure for FPGA CNN Acceleration","authors":"Xinyuan Qu, Zhihong Huang, Ning Mao, Yu Xu, Gang Cai, Zhen Fang","doi":"10.1109/ASICON47005.2019.8983480","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983480","url":null,"abstract":"In recent years, because of its superior performance and outstanding accuracy, convolutional neural networks (CNNs) are widely used in high-tech applications such as image classification and speech recognition. But it is more and more difficult to implement CNN in hardware platform due to the scale of CNN is increasing rapidly. FPGA attracts more attention compared with other processors for its excellent balance of flexibility and efficiency. There are many FPGA-based CNN accelerators proposed by previous work. However, in previous work the computing resource (especially DSP) is not fully utilized, either explicitly or covertly, which affects the CNN accelerator's overall performance seriously. In this work, we propose a new formula that provides a more accurate and comprehensive analysis to evaluate computing resource utilization, which can provide guidance for CNN accelerator design optimization. Then we propose a grain-adaptive computing structure for FPGA-based CNN acceleration, which can change flexibly to suit to and optimally utilize the available DSP resource. Due to the improvement of DSP utilization, we can achieve a more satisfactory result for both overall throughput performance and power efficiency. This architecture is implemented on Xilinx xcku115 based on AlexNet, the frequency is 150MHz and the peak power consumption is 30.05W. The overall performance is 1292.40 GOPS, 43.01 GOP/s/W, resulting in 2.28X and 1.94X, 9.44X and 3.02X improvement compared to previous work [6], [9] correspondingly.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123281830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junxian He, Xichuan Zhou, Yingcheng Lin, C. Sun, Cong Shi, N. Wu, Gang Luo
{"title":"20, 000-fps Visual Motion Magnification on Pixel-parallel Vision Chip","authors":"Junxian He, Xichuan Zhou, Yingcheng Lin, C. Sun, Cong Shi, N. Wu, Gang Luo","doi":"10.1109/ASICON47005.2019.8983493","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983493","url":null,"abstract":"This paper proposes a pixel-parallel Eulerian Video Magnification (EVM) algorithm for vision chips. The proposed algorithm is optimized for the stereotyped programmable pixel-parallel array processor architecture favored by high-speed vision chips. We also propose an improved pixel-parallel array processor with alternative image border padding modes to satisfy various algorithm requirements. We implemented an FPGA prototype of an improved 128 × 128 pixel-parallel array processor to run the proposed optimized EVM algorithm with a 120 MHz clock. Experimental results show that our pixel-parallel system can magnify subtle motion clues at a very high speed up to 20, 000 frames per second (fps).","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121550242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solution Processed Metal Oxide in Emerging Electronic Devices","authors":"Chun Zhao, Cezhou Zhao, T. Zhao","doi":"10.1109/asicon47005.2019.8983521","DOIUrl":"https://doi.org/10.1109/asicon47005.2019.8983521","url":null,"abstract":"Recently, solution processed metal oxide (MO) attracts wide interests due to the advantages including low-cost fabrication, procedure simplicity and vacuum-free technique. Within the paper, the synthesis mechanism of metal oxide deposited through solution process is firstly briefly introduced. Then the recent advances and progress on n-type solution processed MO semiconductors as well as the solution processed MO gate dielectrics have been reviewed for thin-film transistors.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126581641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMV Subspace Pursuit (M-SP) Algorithm for Joint Sparse Multiple Measurement Vectors Recovery","authors":"Sujuan Liu, Lili Zheng, Lei Liu, Qianjin Lin","doi":"10.1109/ASICON47005.2019.8983646","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983646","url":null,"abstract":"In this paper, MMV Subspace Pursuit (M-SP) algorithm is proposed for solving joint sparse multiple measurement vectors (MMV) problem. The pre-selection and backtracking mechanisms are used in M-SP, so M-SP not only has higher recovery performance than some existing algorithms, but also significantly reduces the iteration number for improving the signal recovery efficiency. Simulations results show that M-SP and Simultaneous Compressive Sampling Matching Pursuit (SCoSaMP) have almost identical recovery performance and iteration times, but M-SP significantly reduces the computation complexity in per iteration. For example, when sparsity $K$ is 5, the computational complexity of M-SP is 24.0% of that of SCoSaMP in each iteration.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126101995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bai Song Samuel Lee, Hang-Ji Liu, Xiaopeng Yu, Jer-Ming Chen, K. Yeo
{"title":"An Inductorless 5-GHz Differential Dual Regulated Cross-Cascode Transimpedance Amplifier using 40 nm CMOS","authors":"Bai Song Samuel Lee, Hang-Ji Liu, Xiaopeng Yu, Jer-Ming Chen, K. Yeo","doi":"10.1109/ASICON47005.2019.8983663","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983663","url":null,"abstract":"This paper presents a new inductorless 5-GHz differential dual regulated cross-cascode transimpedance amplifier (DDRCCTIA) using UMC 40 nm CMOS technology. It consists of a differential cross-coupled input stage (DDRCC) that has a unique dual PMOS and NMOS regulated cascode loops as well as a frequency doubler with active inductor (FDAI) buffer stage. The design has a transimpedance gain of 62.5 dBΩ and bandwidth of 5.02 GHz. The power consumption is 7.34 mW from a 1.8 V supply, input referred noise current of 4.5 pA√Hz and a very small core area of 0.0018 mm2.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MinhTri Tran, Nene Kushita, A. Kuwana, Haruo Kobayashi
{"title":"Flat Pass-Band Method with Two RC Band-Stop Filters for 4-Stage Passive RC Polyphase Filter in Low-IF Receiver Systems","authors":"MinhTri Tran, Nene Kushita, A. Kuwana, Haruo Kobayashi","doi":"10.1109/ASICON47005.2019.8983611","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983611","url":null,"abstract":"This paper proposes a flat pass-band for a 4-stage passive RC polyphase filter in a blue-tooth low-IF receiver system; there the bandwidth is 8MHz, the center IF frequency is 4MHz, and the required image rejection ratio is <-30dB. Based on the superposition principle, the transfer function of this filter is derived. As the input signals are the wanted signals, there are two local maximum values which are calculated based on Cauchy-Schwarz inequality theorem at 160kHz (1.24dB) and 40MHz (1.24dB). Therefore, two RC band-stop filters are used to improve the pass-band of these local maximum values (improvement of the ripple gain of pass-band from 2dB into 0.47dB). As a result, a 4-stage passive RC poly-phase filter for a low-IF receiver is designed, where an image rejection ratio is -36dB, and the pass-band gain is flat (the ripple gain is 0.47dB).","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131180213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}