IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
A 28-Gb/s Single-Ended PAM-4 Transceiver With Active-Inductor Equalizer and Amplitude- Detection LSB Decoder for Memory Interfaces 具有有源电感均衡器和幅度检测LSB解码器的28gb /s单端PAM-4收发器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-25 DOI: 10.1109/TVLSI.2024.3496878
Hwaseok Shin;Hyoshin Kang;Yoonjae Choi;Jincheol Sim;Jonghyuck Choi;Youngwook Kwon;Seungwoo Park;Seongcheol Kim;Changmin Sim;Junseob So;Taehwan Kim;Chulwoo Kim
{"title":"A 28-Gb/s Single-Ended PAM-4 Transceiver With Active-Inductor Equalizer and Amplitude- Detection LSB Decoder for Memory Interfaces","authors":"Hwaseok Shin;Hyoshin Kang;Yoonjae Choi;Jincheol Sim;Jonghyuck Choi;Youngwook Kwon;Seungwoo Park;Seongcheol Kim;Changmin Sim;Junseob So;Taehwan Kim;Chulwoo Kim","doi":"10.1109/TVLSI.2024.3496878","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496878","url":null,"abstract":"This study proposes a power-efficient 28-Gb/s single-ended four-level pulse amplitude modulation (PAM-4) transceiver (TRX) for next-generation memory interfaces. In the transmitter (TX), an active-inductor equalizer (EQAI) is utilized, while in the receiver (RX), an amplitude-detection least significant bit (LSB) decoder is employed. In the TX, conventional equalization techniques consume substantial power owing to the inclusion of additional components and strong driving power required to mitigate channel-induced intersymbol interference (ISI). However, the proposed EQAI achieves a bandwidth extension up to the Nyquist frequency through gain boosting while reducing hardware costs and minimizing the driving strength. This results in a simple structure with operational efficiency, facilitating low power consumption and a compact area compared with conventional TX equalizers. In PAM-4 RX, the power dissipation is proportional to the clock buffer and the number of comparators used for data decoding. To improve the hardware cost and the power usage in the RX, the proposed RX design utilizes an amplitude-detection LSB decoder, which reduces the number of comparators and comprises a one-stage structure by detecting the amplitude differences between the reference and input voltages during LSB decoding. This ensures the hardware cost and power consumption improvement while implementing a one-tap direct decision feedback equalizer (DFE). The TRX for memory interfaces is optimized for low-power performance by employing these methods, resulting in a notable energy efficiency of 0.96 pJ/bit. This structure is fabricated using a 28-nm CMOS technology, and the core area of the TRX occupies 0.0053 mm2.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"662-672"},"PeriodicalIF":2.8,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Securet3d: An Adaptive, Secure, and Fault-Tolerant Aware Routing Algorithm for Vertically–Partially Connected 3D-NoC Securet3d:一种垂直部分连接3D-NoC的自适应、安全、容错感知路由算法
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-25 DOI: 10.1109/TVLSI.2024.3500575
Alexandre Almeida da Silva;Lucas Nogueira;Alexandre Coelho;Jarbas A. N. Silveira;César Marcon
{"title":"Securet3d: An Adaptive, Secure, and Fault-Tolerant Aware Routing Algorithm for Vertically–Partially Connected 3D-NoC","authors":"Alexandre Almeida da Silva;Lucas Nogueira;Alexandre Coelho;Jarbas A. N. Silveira;César Marcon","doi":"10.1109/TVLSI.2024.3500575","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3500575","url":null,"abstract":"Multiprocessor systems-on-chip (MPSoCs) based on 3-D networks-on-chip (3D-NoCs) are crucial architectures for robust parallel computing, efficiently sharing resources across complex applications. To ensure the secure operation of these systems, it is essential to implement adaptive, fault-tolerant mechanisms capable of protecting sensitive data. This work proposes the Securet3d routing algorithm, which establishes secure data paths in fault-tolerant 3D-NoCs. Our approach enhances the Reflect3d algorithm by introducing a detailed scheme for mapping secure paths and improving the system’s ability to withstand faults. To validate its effectiveness, we compare Securet3d with three other fault-tolerant routing algorithms for vertically-partially connected 3D-NoCs. All algorithms were implemented in SystemVerilog and evaluated through simulation using ModelSim and hardware synthesis with Cadence’s Genus tool. Experimental results show that Securet3d reduces latency and enhances cost-effectiveness compared with other approaches. When implemented with a 28-nm technology library, Securet3d demonstrates minimal area and energy overhead, indicating scalability and efficiency. Under denial-of-service (DoS) attacks, Securet3d maintains basically unaltered average packet latencies on 70, 90, and 29 clock cycles for uniform random, bit-complement, and shuffle traffic, significantly lower than those of other algorithms without including security mechanisms (5763, 4632, and 3712 clock cycles in average, respectively). These results highlight the superior security, scalability, and adaptability of Securet3d for complex communication systems.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"275-287"},"PeriodicalIF":2.8,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retry-Based Synchronization for Online Testing of Identical Logic Blocks 基于重试的同步在线测试相同逻辑块
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-25 DOI: 10.1109/TVLSI.2024.3501402
Irith Pomeranz
{"title":"Retry-Based Synchronization for Online Testing of Identical Logic Blocks","authors":"Irith Pomeranz","doi":"10.1109/TVLSI.2024.3501402","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3501402","url":null,"abstract":"State-of-the-art designs include identical instances of logic blocks to support parallel computations. Identical logic blocks at close physical proximity can be tested online by comparing their output sequences. This removes the need for known input and output sequences. To use output comparison for two logic blocks, <inline-formula> <tex-math>$B_{0}$ </tex-math></inline-formula> and <inline-formula> <tex-math>$B_{1}$ </tex-math></inline-formula>, the logic blocks should be synchronized to the same state, and the same input sequence should be applied to them. Assuming that <inline-formula> <tex-math>$B_{0}$ </tex-math></inline-formula> performs functional computations and <inline-formula> <tex-math>$B_{1}$ </tex-math></inline-formula> is idle, a process described earlier synchronizes <inline-formula> <tex-math>$B_{1}$ </tex-math></inline-formula> to the state of <inline-formula> <tex-math>$B_{0}$ </tex-math></inline-formula> by using a synchronization period where <inline-formula> <tex-math>$B_{1}$ </tex-math></inline-formula> receives the input sequence of <inline-formula> <tex-math>$B_{0}$ </tex-math></inline-formula>, and values of selected state variables are copied from <inline-formula> <tex-math>$B_{0}$ </tex-math></inline-formula> to <inline-formula> <tex-math>$B_{1}$ </tex-math></inline-formula>. A single synchronization period was used earlier. The first key contribution of this article is to introduce a retry-based synchronization process with multiple synchronization periods to avoid flagging synchronization failures as faults. The second contribution of this article is to develop the synchronization process in a simulation environment that considers functional operation conditions. Experimental results for benchmark circuits demonstrate the effectiveness of the retry-based process and the importance of the functional simulation environment.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1447-1451"},"PeriodicalIF":2.8,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Area-Efficient Pipeline Architecture for Serial Real-Valued Fast Fourier Transform 串行实值快速傅里叶变换的面积高效管道结构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-25 DOI: 10.1109/TVLSI.2024.3496922
Kun Li;Hongji Fang;Zhenguo Ma;Feng Yu;Bo Zhang;Qianjian Xing
{"title":"Area-Efficient Pipeline Architecture for Serial Real-Valued Fast Fourier Transform","authors":"Kun Li;Hongji Fang;Zhenguo Ma;Feng Yu;Bo Zhang;Qianjian Xing","doi":"10.1109/TVLSI.2024.3496922","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496922","url":null,"abstract":"This brief presents a novel pipeline architecture designed to compute the fast Fourier transform (FFT) on real input signals in a serial format. This architecture significantly improves resource efficiency by sharing adders between butterfly and rotator structures. In addition, a novel data management approach for N-point radix-2 serial real-valued FFT (RFFT) has been proposed, which not only simplifies the data reordering circuit between processing elements (PEs) but also achieves natural order data output. The real-valued 1024-point FFT has been implemented on a field-programmable gate array (FPGA). Compared with typical real-valued serial commutator (RSC) FFT architecture, the proposed architecture achieves substantial improvement, including a reduction of 10.3% in the number of lookup tables (LUTs) and 12.5% in flip-flops (FFs).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1427-1431"},"PeriodicalIF":2.8,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 7.4–9.2-GHz Fractional-N Differential Sampling PLL Based on Phase-Domain and Voltage-Domain Hybrid Calibration 基于相域和电压域混合校准的 7.4-9.2-GHz 分数-N 差分采样 PLL
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-22 DOI: 10.1109/TVLSI.2024.3496931
Feng Bu;Ruixue Ding;Depeng Sun;Ge Wang;Yuan Gao;Rong Zhou;Xiaoteng Zhao;Lisheng Chen;Shubin Liu;Zhangming Zhu
{"title":"A 7.4–9.2-GHz Fractional-N Differential Sampling PLL Based on Phase-Domain and Voltage-Domain Hybrid Calibration","authors":"Feng Bu;Ruixue Ding;Depeng Sun;Ge Wang;Yuan Gao;Rong Zhou;Xiaoteng Zhao;Lisheng Chen;Shubin Liu;Zhangming Zhu","doi":"10.1109/TVLSI.2024.3496931","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496931","url":null,"abstract":"This brief proposes a 7.4–9.2-GHz low-noise fractional-N differential sampling phase-locked loop (DSPLL), which features doubled phase detector (PD) gain. By using the phase-domain and voltage-domain hybrid calibration, the accumulated quantization error (Q-error) of the delta-sigma modulator (DSM) is compensated, and the locking problem caused by large sampling voltage fluctuation is solved. Meanwhile, a voltage shifting technique is introduced to adjust the locked voltage region of differential sampling PD (DSPD), which can improve the linearity of DSPLL for better calibration. Fabricated in 65-nm CMOS process, the presented DSPLL achieves measured integrated jitter of 69.09 and 73.26 fs for integer-N and fractional-N modes, respectively. The reference spur is −72.96 dBc, and the worst fractional spur is −55.26 dBc. The total power consumption is 19.2 mW at a 1.2-V supply, achieving a figure of merit jitter (FOMJ) of −249.9 dB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1442-1446"},"PeriodicalIF":2.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An On-Chip Low-Cost Averaging Digital Sampling Scope for 80-GS/s Measurement of Wireline Pulse Responses 用于80-GS/s电缆脉冲响应测量的片上低成本平均数字采样示波器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-21 DOI: 10.1109/TVLSI.2024.3497213
Won Joon Choi;Myungguk Lee;Junung Choi;Jaeik Cho;Gain Kim;Byungsub Kim
{"title":"An On-Chip Low-Cost Averaging Digital Sampling Scope for 80-GS/s Measurement of Wireline Pulse Responses","authors":"Won Joon Choi;Myungguk Lee;Junung Choi;Jaeik Cho;Gain Kim;Byungsub Kim","doi":"10.1109/TVLSI.2024.3497213","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3497213","url":null,"abstract":"Determining a channel’s characteristics is a fundamental step for designing a high-speed link system. By identifying the properties of the channel, designers can gain insights into how to transmit a signal with low distortion and optimize a transceiver’s architecture. As the channel’s characteristics can be identified by analyzing its single-bit pulse response (PR), obtaining an accurate PR plot is critical for reliable channel characterization. Therefore, it is preferred to measure the PR in situ to minimize the parasitic effects. In this work, we introduce a novel approach for measuring PR in situ, designed to quickly and accurately generate undistorted plot results. To prove the efficacy of the proposed method, we designed an on-chip sampling scope circuit and fabricated a test chip in 28-nm CMOS technology. While being able to measure a distortion-free PR, the proposed method demonstrates a more than <inline-formula> <tex-math>$10^{5}$ </tex-math></inline-formula> times faster pulse acquisition rate than prior arts.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1432-1436"},"PeriodicalIF":2.8,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV 基于中粒度数据流的SpTRSV高效硬件加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-20 DOI: 10.1109/TVLSI.2024.3497166
Qian Chen;Xiaofeng Yang;Shengli Lu
{"title":"Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV","authors":"Qian Chen;Xiaofeng Yang;Shengli Lu","doi":"10.1109/TVLSI.2024.3497166","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3497166","url":null,"abstract":"Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflows can be categorized into coarse and fine granularity. Coarse dataflows offer good spatial locality but suffer from low parallelism, while fine dataflows provide high parallelism but disrupt the spatial structure, leading to increased nodes and poor data reuse. This article proposes a novel hardware accelerator for SpTRSV or SpTRSV-like directed acyclic graphs (DAGs). The accelerator implements a medium granularity dataflow through hardware-software codesign and achieves both excellent spatial locality and high parallelism. In addition, a partial sum caching mechanism is introduced to reduce the blocking frequency of processing elements (PEs), and a reordering algorithm of intranode edges’ computation is developed to enhance data reuse. Experimental results on 245 benchmarks with node counts reaching up to 85392 demonstrate that this work achieves average performance improvements of <inline-formula> <tex-math>$7.0times $ </tex-math></inline-formula> (up to <inline-formula> <tex-math>$27.8times $ </tex-math></inline-formula>) over CPUs and <inline-formula> <tex-math>$5.8times $ </tex-math></inline-formula> (up to <inline-formula> <tex-math>$98.8times $ </tex-math></inline-formula>) over GPUs. Compared with the state-of-the-art technique (DPU-v2), this work shows a <inline-formula> <tex-math>$2.5times $ </tex-math></inline-formula> (up to <inline-formula> <tex-math>$5.9times $ </tex-math></inline-formula>) average performance improvement and <inline-formula> <tex-math>$1.7times $ </tex-math></inline-formula> (up to <inline-formula> <tex-math>$4.1times $ </tex-math></inline-formula>) average energy efficiency enhancement.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"807-820"},"PeriodicalIF":2.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Performance Testing for Analog Integrated Circuits 基于深度学习的模拟集成电路性能测试
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-20 DOI: 10.1109/TVLSI.2024.3496777
Jiawei Cao;Chongtao Guo;Houjun Wang;Zhigang Wang;Hao Li;Geoffrey Ye Li
{"title":"Deep Learning-Based Performance Testing for Analog Integrated Circuits","authors":"Jiawei Cao;Chongtao Guo;Houjun Wang;Zhigang Wang;Hao Li;Geoffrey Ye Li","doi":"10.1109/TVLSI.2024.3496777","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496777","url":null,"abstract":"In this brief, we propose a deep learning-based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the mapping from the response of the circuit under test (CUT) in each module to all specifications to be tested. Then, the required test modules are selected by solving a 0–1 integer programming problem. Finally, the predictions from the selected test modules are combined by a DNN to form the specification estimations. The simulation results validate the proposed approach in terms of testing accuracy and cost.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"1187-1191"},"PeriodicalIF":2.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA-Based Low-Bit and Lightweight Fast Light Field Depth Estimation 基于fpga的低比特轻量级快速光场深度估计
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-19 DOI: 10.1109/TVLSI.2024.3496751
Jie Li;Chuanlun Zhang;Wenxuan Yang;Heng Li;Xiaoyan Wang;Chuanjun Zhao;Shuangli Du;Yiguang Liu
{"title":"FPGA-Based Low-Bit and Lightweight Fast Light Field Depth Estimation","authors":"Jie Li;Chuanlun Zhang;Wenxuan Yang;Heng Li;Xiaoyan Wang;Chuanjun Zhao;Shuangli Du;Yiguang Liu","doi":"10.1109/TVLSI.2024.3496751","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496751","url":null,"abstract":"The 3-D vision computing is a key application in unmanned systems, satellites, and planetary rovers. Learning-based light field (LF) depth estimation is one of the major research directions in 3-D vision computing. However, conventional learning-based depth estimation methods involve a large number of parameters and floating-point operations, making it challenging to achieve low-power, fast, and high-precision LF depth estimation on a field-programmable gate array (FPGA). Motivated by this issue, an FPGA-based low-bit, lightweight LF depth estimation network (L\u0000<inline-formula> <tex-math>$^{3}text {FNet}$ </tex-math></inline-formula>\u0000) is proposed. First, a hardware-friendly network is designed, which has small weight parameters, low computational load, and a simple network architecture with minor accuracy loss. Second, we apply efficient hardware unit design and software-hardware collaborative dataflow architecture to construct an FPGA-based fast, low-bit acceleration engine. Experimental results show that compared with the state-of-the-art works with lower mean-square error (mse), L\u0000<inline-formula> <tex-math>$^{3}text {FNet}$ </tex-math></inline-formula>\u0000 can reduce the computational load by more than 109 times and weight parameters by approximately 78 times. Moreover, on the ZCU104 platform, it requires 95.65% lookup tables (LUTs), 80.67% digital signal processors (DSPs), 80.93% BlockRAM (BRAM), 58.52% LUTRAM, and 9.493-W power consumption to achieve an efficient acceleration engine with a latency as low as 272 ns. The code and model of the proposed method are available at \u0000<uri>https://github.com/sansi-zhang/L3FNet</uri>\u0000.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"88-101"},"PeriodicalIF":2.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 22-nm All-Digital Time-Domain Neural Network Accelerator for Precision In-Sensor Processing 用于精密传感器处理的22纳米全数字时域神经网络加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-19 DOI: 10.1109/TVLSI.2024.3496090
Ahmed M. Mohey;Jelin Leslin;Gaurav Singh;Marko Kosunen;Jussi Ryynänen;Martin Andraud
{"title":"A 22-nm All-Digital Time-Domain Neural Network Accelerator for Precision In-Sensor Processing","authors":"Ahmed M. Mohey;Jelin Leslin;Gaurav Singh;Marko Kosunen;Jussi Ryynänen;Martin Andraud","doi":"10.1109/TVLSI.2024.3496090","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3496090","url":null,"abstract":"Deep neural network (DNN) accelerators are increasingly integrated into sensing applications, such as wearables and sensor networks, to provide advanced in-sensor processing capabilities. Given wearables’ strict size and power requirements, minimizing the area and energy consumption of DNN accelerators is a critical concern. In that regard, computing DNN models in the time domain is a promising architecture, taking advantage of both technology scaling friendliness and efficiency. Yet, time-domain accelerators are typically not fully digital, limiting the full benefits of time-domain computation. In this work, we propose an all-digital time-domain accelerator with a small size and low energy consumption to target precision in-sensor processing like human activity recognition (HAR). The proposed accelerator features a simple and efficient architecture without dependencies on analog nonidealities such as leakage and charge errors. An eight-neuron layer (core computation layer) is implemented in 22-nm FD-SOI technology. The layer occupies \u0000<inline-formula> <tex-math>$70 times ,70,mu $ </tex-math></inline-formula>\u0000m while supporting multibit inputs (8-bit) and weights (8-bit) with signed accumulation up to 18 bits. The power dissipation of the computation layer is 576\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000W at 0.72-V supply and 500-MHz clock frequency achieving an average area efficiency of 24.74 GOPS/mm2 (up to 544.22 GOPS/mm2), an average energy efficiency of 0.21 TOPS/W (up to 4.63 TOPS/W), and a normalized energy efficiency of 13.46 1b-TOPS/W (up to 296.30 1b-TOPS/W).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2220-2231"},"PeriodicalIF":2.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信