Zhongyuan Feng, Bo Wang, Zhaoyang Zhang, An Guo, Xin Si
{"title":"A Booth-based Digital Compute-in-Memory Marco for Processing Transformer Model","authors":"Zhongyuan Feng, Bo Wang, Zhaoyang Zhang, An Guo, Xin Si","doi":"10.1109/APCCAS55924.2022.10090256","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090256","url":null,"abstract":"Transformer model has achieved excellent results in many fields, owing of its huge data volume and high precision requirements, the traditional analog compute-in-memory circuit can no longer meet its needs. To solve this dilemma, this paper proposes a digital compute-in-memory circuit based on the improved Booth algorithm. The 6T SRAM array stores the multiplicand, and the multiplier is encoded by the booth encoder, and then, local computing cell (LCC) read the corresponding value from the array according to the encoding result. These values are finally sent to the dual-mode shift and add module (DMSA) to obtain the computation results. The proposed circuit achieved energy efficiency of 33.11TOPS/W@INT8 and 8.3 TOPS/W@INT16. And the proposed circuit achieved 1.92+ better energy efficiency compared with previous works.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA Implementation of Matrix Decomposition Based FIR Filter","authors":"Hao Wang, Jia Yan","doi":"10.1109/APCCAS55924.2022.10090350","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090350","url":null,"abstract":"Matrix decomposition (MD) based finite impulse response filter (FIR) can synthesize any FIR filter with much fewer coefficients, without affecting the group delay and only scarcely affecting the frequency domain design error. Several researchers have advanced the theoretical analysis of a MD-FIR filter since it is first proposed. As the previous research is all about the theoretical analysis, this study presents the FPGA implementation of MD-FIR filters for the first time. First, a continuous coefficient MD-FIR filter is designed by using the well-developed method. Then, this MD-FIR filter is implemented in Matlab Simulink. Afterwards, the Verilog code for implementing a MD-FIR filter is automatically generated based on the Matlab Simulink implementation. Finally, based on the Verilog code, the MD-FIR filter is simulated and implemented in Field Programmable Gate Arrays (FPGA). The results verify the effectiveness of a MD-FIR filter.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"1891 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130021001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaobo Luo, Zhiyuan Xie, Gengxin Chen, Lei Cui, Mei Yan, Xiwei Huang, Shuwei Li, Changhai Man, Wei Mao, Hao Yu
{"title":"Hierarchical DNN with Heterogeneous Computing Enabled High-Performance DNA Sequencing","authors":"Shaobo Luo, Zhiyuan Xie, Gengxin Chen, Lei Cui, Mei Yan, Xiwei Huang, Shuwei Li, Changhai Man, Wei Mao, Hao Yu","doi":"10.1109/APCCAS55924.2022.10090281","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090281","url":null,"abstract":"DNA sequencing is a popular tool to demystify the code of living organisms and is reforming the medical, pharmaceutical and biotech industries. The Next-Generation Sequencing (NGS) plays a vital role in high-throughput DNA sequencing with massively parallel data generation. Nevertheless, the massive amount of data imposes great challenges for data analysis. It is arduous to reach a low error rate for handling noisy and/or biased signals owing to the imperfect biochemical reactions and imaging systems. Furthermore, a homogeneous computing system lacks computing power and memory bandwidth. Therefore, in this work, a heterogeneous computing platform with a hierarchical deep neural network sequencing pipeline is proposed to improve the sequencing quality and increase processing speed. Experiments demonstrate that the proposed work reached higher effective throughput (12.18% more clusters found), lower error rate (0.0175%), higher quality score (%Q30 99.27%), and 19% faster. The reported work empowers virus detection, diseases diagnostic, and other potential biomedical applications.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129551161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Energy-Efficient Mixed-Bit ReRAM-based Computing-in-Memory CNN Accelerator with Fully Parallel Readout","authors":"Dingbang Liu, Wei Mao, Haoxiang Zhou, Jun Liu, Qiuping Wu, Haiqiao Hong, Hao Yu","doi":"10.1109/APCCAS55924.2022.10090365","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090365","url":null,"abstract":"Computing-In-memory (CIM) accelerators have the characteristics of storage and computing integration, which has the potential to break through the limit of Moore's law and the bottleneck of Von-Neumann architecture. However, the performance of CIM accelerators is still limited by conventional CNN architectures and inefficient readouts. To increase energy-efficient performance, optimized CNN model is required and low-power fully parallel readout is necessary for edge-computing hardware. In this work, an ReRAM-based CNN accelerator is designed. Mixed-bit 1~8-bit operations are supported by bitwidth configuration scheme for implementing Neural Architecture Search (NAS)-optimized multi-bit CNNs. Besides, energy-efficient fully parallel readout is achieved by variation-reduction accumulation mechanism and low-power readout circuits. Benchmarks show that the proposed ReRAM accelerator can achieve peak energy efficiency of 2490.32 TOPS/W for 1-bit operation and average energy efficiency of 479.37 TOPS/W for 1~8-bit operations when evaluating NAS-optimized multi-bitwidth CNNs.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"55 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113939342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low-latency Multi-format Carrier Phase Recovery Hardware for Coherent Optical Communication","authors":"Changlong Lv, Liyu Lin, Honghui Deng, Junhui Wang, Yun Chen","doi":"10.1109/APCCAS55924.2022.10090345","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090345","url":null,"abstract":"Fast and accurate estimation of carriers is particularly vital in modern coherent optical communication. Achieving a real-time system with low latency and high throughput is challenging. The implementation structure of the real-time system is essential due to the increasing complexity of algorithms and data decimals. This paper proposes a carrier phase recovery (CPR) implementation based on blind phase search (BPS) with data multiplexing. All the computations are performed in cartesian coordinates without using coordinate conversion and standard multiplier or DSP. When evaluating this circuit structure on the Xilinx ZCU102 platform, the clock frequency can reach 620MHz, and the latency of processing 79.3Gbps 16QAM signal is 18 clock cycles.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"41 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120884141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 97 fJ/Conversion Neuron-ADC with Reconfigurable Sampling and Static Power Reduction","authors":"Jinbo Chen, Hui Wu, Jie Yang, M. Sawan","doi":"10.1109/APCCAS55924.2022.10090325","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090325","url":null,"abstract":"A bio-inspired Neuron-ADC with reconfigurable sampling and static power reduction for biomedical applications is proposed in this work. The Neuron-ADC leverages level-crossing sampling and a bio-inspired refractory circuit to compressively converts bio-signal to digital spikes and information-of-interest. The proposed design can not only avoid dissipating ADC energy on unnecessary data but also achieve reconfigurable sampling, making it appropriate for either low power operation or high accuracy conversion when dealing with various kinds of bio-signals. Moreover, the proposed dynamic comparator can reduce static power up to 41.1% when tested with a 10 kHz sinusoidal input. Simulation results of 40 nm CMOS process show that the Neuron-ADC achieves a maximum ENOB of 6.9 bits with a corresponding FoM of 97 fJ/conversion under 0.6 V supply voltage.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115872607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chaotic Sampling of Double Scroll Chaos for Digital Random Number Generation","authors":"Onur Karatas, Kaya Demir, Salih Ergün","doi":"10.1109/APCCAS55924.2022.10090396","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090396","url":null,"abstract":"This article introduces a random number generator based on chaotic oscillators. Digitally obtained double scroll chaos was used as the basis for RNG design. The RNG is implemented on FPGA at the register transfer level using the third order ordinary differential equation to generate double scroll chaos. The 5-bit signed integer of a 32-bit fixed point number was used in the implementation. A signal that featured chaotic characteristics was used as a source and another chaotic signal was employed to sample the source signal to generate random bits. The random number generator that has been suggested is built using Verilog hardware description language and experimentally demonstrated on a Xilinx Virtex VC707 FPGA. The collected binary bits were subjected to the FIPS 140–2 randomness test suite and passed the tests successfully.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122260187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical and Recursive Floorplanning Algorithm for NoC-Bascd Scalable Multi-Die FPGAs","authors":"Jianwen Luo, Xinzhe Liu, Fupeng Chen, Y. Ha","doi":"10.1109/APCCAS55924.2022.10090338","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090338","url":null,"abstract":"Emerging applications are calling for significantly larger FPGAs with multi-dies. However, these multi-die FPGAs with a traditional substrate-based interconnection are not scalable enough, because the execution time and probability of failure of their floorplanning algorithm will increase dramatically with the growth of design or the number of ides. Therefore, future multi-die FPGAs will require a scalable interconnection architecture and its associated floorplanning algorithm. To address this issue, we propose both a new NoC-based scalable multi-die FPGA architecture and a corresponding floorplanning algorithm, namely Hierarchical and Recursive Floorplanning Algorithm(HRFA). First, we introduce the interconnection architecture with a class of scalable hierarchical topologies. Second, we formulate the floorplanning problem for the proposed NoC architecture as an ILP (Integer Linear Programming). Third, we develop a novel recursive method to solve the ILP formulation by taking advantage of the parallelization opportunities exploited from the hierarchical interconnection architectures. The experiments on a Convolutional Neural Network (CNN) benchmark show that the scalability of our proposed technique is at least $3times$ as that of the state-of-the-art solutions measured by the size of the feasible benchmark, with no loss of design performance.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125169773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis and Design of High-Speed and Low-Distortion Bootstrapped Switches","authors":"Jing Ma, Yuanqi Hu","doi":"10.1109/APCCAS55924.2022.10090277","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090277","url":null,"abstract":"Bootstrapped switches are widely applied in a variety of applications requiring high-speed and high-linearity sampling. However, the parasitic effects of switches causing distortions have not been thoroughly analysed before. In this paper three fundamental non-ideal factors have been explored, which are source-drain exchange, parasitic capacitive division and charge redistribution, and their impact has been mathematically derived respectively. It is found that both on-resistance and the bootstrapped capacitor have a quadratic (40 dB/dec) impact on the third harmonic distortion of sampled signals. Afterward, a design guideline is proposed and verified by a 2-MSPS-&-90dB-THD design case. Comprehensive simulation with various design parameters shows that our design flow could feasibly optimise the design cost without sacrificing performance and hence being constructive for designers of bootstrapped switches.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127592537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Approximate-Computing-Based Adaptive Equalizer for Polarization Mode Dispersion","authors":"Liyu Lin, Junhui Wang, Xiaoyang Zeng, Yun Chen","doi":"10.1109/APCCAS55924.2022.10090404","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090404","url":null,"abstract":"Computational complexity is the most significant defect of coherent optical communication, which consumes a large area and leads to high power consumption, especially for the adaptive filter used for polarization mode dispersion (PMD). In this paper, we implement a 9-tap intro-polarization and 1-tap inter-polarization equalizer, which reduces 34.4% multiplication of the conventional structure. Besides, we proposed an approximate multiplier to save 44.6% full adder. Under the QPSK modulation, the proposed equalizer has a throughput of 114Gb/s and a power of 463mW at 1.786GHz. Synthesis shows that the area of the proposed 16-way parallel adaptive equalizer is 0.365mm2 with a 28 nm process, which has an improvement of 27.86% in area, and 37.88% in energy efficiency to the fix-point structure.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"369 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}