IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
ABS: Accumulation Bit-Width Scaling Method for Designing Low-Precision Tensor Core ABS:设计低精度张量核心的累积位宽缩放方法
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-25 DOI: 10.1109/TVLSI.2024.3414260
Yasong Cao;Mei Wen;Zhongdi Luo;Xin Ju;Haolan Huang;Junzhong Shen;Haiyan Chen
{"title":"ABS: Accumulation Bit-Width Scaling Method for Designing Low-Precision Tensor Core","authors":"Yasong Cao;Mei Wen;Zhongdi Luo;Xin Ju;Haolan Huang;Junzhong Shen;Haiyan Chen","doi":"10.1109/TVLSI.2024.3414260","DOIUrl":"10.1109/TVLSI.2024.3414260","url":null,"abstract":"A big gap exists between deep neural network (DNN) applications’ computational demand and the computing power of DNN accelerators. Low-precision floating-point (LP-FP) computation is one of the important means to improve the performance of DNN training and inference. However, the high-precision accumulators are typically applied to summating the dot products during general matrix multiplication (GEMM) in tensor cores (TCs). As the precision of data decreases, the accumulator becomes the main consumer of multiply-accumulate’s (MAC’s) area and power. Reducing the accumulators’ bit-width is of significant importance for improving the area- and energy-efficiency of TCs. There are two main challenges: 1) theoretical support on the floating-point (FP) formats with the lowest bit-width of TC’s accumulators and 2) how to integrate the LP-FP TC in the framework of DNN training and inference to evaluate its benefits. In this article, we propose accumulation bit-width scaling (ABS), a novel ABS method, to guide the design of LP-FP TCs. We 1) implement this method by constructing a novel variance retention ratio (VRR) model to predict the FP format with the minimum bit-width for TC’s accumulator; 2) provide a generator of DNN accelerator based on a systolic-array (SA) TC, supporting many low-precision configurations; and 3) design an LP-FP DNN executing framework that supports software-simulation mode and hardware-accelerator mode to run LP-FP DNN tasks. The experimental results show that the LP-FP TC guided by our ABS method has a maximum reduction of 76.47% and 75.60% in area and power consumption, respectively, compared with the advanced TCs.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 28-nm Dual-Mode Explicit Class-F₂₃ VCO With Low-Loss CM Return Path Achieving 70–400-kHz 1/f³ PN Corner Over 4.9–7.3-GHz TR 具有低损耗 CM 返回路径的 28 纳米双模显式 F$_{23}$ 类 VCO,可在 4.9-7.3-GHz TR 范围内实现 70-400-kHz 1/$f^{3}$ PN 波角
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-25 DOI: 10.1109/TVLSI.2024.3414158
Shan Lu;Danyu Wu;Xuan Guo;Hanbo Jia;Yong Chen;Xinyu Liu
{"title":"A 28-nm Dual-Mode Explicit Class-F₂₃ VCO With Low-Loss CM Return Path Achieving 70–400-kHz 1/f³ PN Corner Over 4.9–7.3-GHz TR","authors":"Shan Lu;Danyu Wu;Xuan Guo;Hanbo Jia;Yong Chen;Xinyu Liu","doi":"10.1109/TVLSI.2024.3414158","DOIUrl":"10.1109/TVLSI.2024.3414158","url":null,"abstract":"This brief presents an explicit Class-F23 voltage-controlled oscillator (VCO). The square-like voltage waveform is obtained via waveform shaping, and flicker noise upconversion is suppressed by a proper common-mode (CM) return path. CM resonance at the second harmonic frequency is introduced by a compact octagonal inductor. The rms value of the impulse sensitivity function (ISF) is significantly reduced through Class-F23 operation. The VCO switches between two modes of a high-order LC resonator consisting of two identical LC tanks coupled by capacitors. A prototype of the VCO is implemented in a 28-nm CMOS. Measurements show a continuous tuning range (TR) of 4.89–7.29 GHz, with a peak figure of merit (FoM) of 190.5 dB/Hz at 5.8 GHz and better than 188.5 dB across the entire TR. The flicker phase-noise corner ranges from 70 to 400 kHz. The VCO consumes 16–19 mW from a 0.5-V supply and occupies an active area of 0.21 mm2.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing 1.63 pJ/SOP 神经形态处理器,集成部分和路由器,用于网内计算
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-24 DOI: 10.1109/tvlsi.2024.3409652
Dongrui Li, Ming Ming Wong, Yi Sheng Chong, Jun Zhou, Mohit Upadhyay, Ananta Balaji, Aarthy Mani, Weng Fai Wong, Li Shiuan Peh, Anh Tuan Do, Bo Wang
{"title":"1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing","authors":"Dongrui Li, Ming Ming Wong, Yi Sheng Chong, Jun Zhou, Mohit Upadhyay, Ananta Balaji, Aarthy Mani, Weng Fai Wong, Li Shiuan Peh, Anh Tuan Do, Bo Wang","doi":"10.1109/tvlsi.2024.3409652","DOIUrl":"https://doi.org/10.1109/tvlsi.2024.3409652","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 206 μW Vital Signs Monitoring System on Chip for Measuring Five Vitals 用于测量五种生命体征的 206 $mu$W 片上生命体征监测系统
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-24 DOI: 10.1109/TVLSI.2024.3415469
Sameen Minto;Austin Cable;Wala Saadeh
{"title":"A 206 μW Vital Signs Monitoring System on Chip for Measuring Five Vitals","authors":"Sameen Minto;Austin Cable;Wala Saadeh","doi":"10.1109/TVLSI.2024.3415469","DOIUrl":"10.1109/TVLSI.2024.3415469","url":null,"abstract":"This article presents an area and power-efficient system-on-chip (SoC) for vital signs monitoring to provide patients with remote monitoring. It measures five important vitals including blood oxygen saturation (SpO2), respiration rate (RR), heart rate (HR), HR variability (HRV), and temperature. The proposed SoC utilizes a photoplethysmography (PPG) signal to compute HR, HRV, SpO2, and RR. The PPG signal is amplified and filtered using a PPG readout that includes a transimpedance amplifier (TIA) with a switched integrator (SI) to filter and amplify the signal. A differential second-order, delta-sigma analog-to-digital converter (\u0000<inline-formula> <tex-math>$Delta Sigma $ </tex-math></inline-formula>\u0000-ADC) is adopted to digitize the PPG signal. The SoC also comprises a low-power LED driver for both red and infrared (IR) LEDs which operate in pulsed mode with a 0.625% duty cycle. A vital signs extractor performs feature extraction (FE) and computes the vital signs with a maximum absolute error of less than 1%. In this work, the temperature is also measured by employing a Wheatstone bridge (WhB)-based temperature sensor which integrates thermal resistors into a second-order \u0000<inline-formula> <tex-math>$Delta Sigma $ </tex-math></inline-formula>\u0000-ADC. The proposed system shares \u0000<inline-formula> <tex-math>$Delta Sigma $ </tex-math></inline-formula>\u0000-ADC for digitizing the PPG signal and the temperature readings to reduce both area and power consumption. The proposed system computes the temperature over the human’s temperature range (\u0000<inline-formula> <tex-math>$32~^{circ }$ </tex-math></inline-formula>\u0000 C to \u0000<inline-formula> <tex-math>$42~^{circ }$ </tex-math></inline-formula>\u0000 C) with an accuracy of +/\u0000<inline-formula> <tex-math>$- 0.09~^{circ }$ </tex-math></inline-formula>\u0000 C. The SoC is implemented using a 180 nm CMOS process with an area of 4.8 mm2 while consuming \u0000<inline-formula> <tex-math>$206~mu $ </tex-math></inline-formula>\u0000 W.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLSI Design of Light-Field Factorization for Dual-Layer Factored Display 用于双层因式显示器的光场因式化 VLSI 设计
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-24 DOI: 10.1109/tvlsi.2024.3414262
Li-De Chen, Li-Qun Weng, Hao-Chien Cheng, An-Yu Cheng, Kai-Ping Lin, Chao-Tsung Huang
{"title":"VLSI Design of Light-Field Factorization for Dual-Layer Factored Display","authors":"Li-De Chen, Li-Qun Weng, Hao-Chien Cheng, An-Yu Cheng, Kai-Ping Lin, Chao-Tsung Huang","doi":"10.1109/tvlsi.2024.3414262","DOIUrl":"https://doi.org/10.1109/tvlsi.2024.3414262","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore Clusters 在共享 L1 内存的多核集群中实现高效混合 Systolic 计算
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-24 DOI: 10.1109/TVLSI.2024.3415486
Sergio Mazzola;Samuel Riedel;Luca Benini
{"title":"Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore Clusters","authors":"Sergio Mazzola;Samuel Riedel;Luca Benini","doi":"10.1109/TVLSI.2024.3415486","DOIUrl":"10.1109/TVLSI.2024.3415486","url":null,"abstract":"Systolic arrays and shared-L1-memory manycore clusters are commonly used architectural paradigms that offer different trade-offs to accelerate parallel workloads. While the first excel with regular dataflow at the cost of rigid architectures and complex programming models, the second are versatile and easy to program but require explicit dataflow management and synchronization. This work aims at enabling efficient systolic execution on shared-L1-memory manycore clusters. We devise a flexible architecture where small and energy-efficient RISC-V cores act as the systolic array’s processing elements (PEs) and can form diverse, reconfigurable systolic topologies through queues mapped in the cluster’s shared memory. We introduce two low-overhead RISC-V instruction set architecture (ISA) extensions for efficient systolic execution, namely Xqueue and queue-linked registers (QLRs), which support queue management in hardware. The Xqueue extension enables single-instruction access to shared-memory-mapped queues, while QLRs allow implicit and autonomous access to them, relieving the cores of explicit communication instructions. We demonstrate Xqueue and QLRs in MemPool, an open-source shared-memory cluster with 256 PEs, and analyze the hybrid systolic-shared-memory architecture’s trade-offs on several digital signal processing (DSP) kernels with diverse arithmetic intensity. For an area increase of just 6%, our hybrid architecture can double MemPool’s compute unit utilization, reaching up to 73%. In typical conditions (TT/0.80 V/25 °C), in a 22-nm FDX technology, our hybrid architecture runs at 600 MHz with no frequency degradation and is up to 65% more energy efficient than the shared-memory baseline, achieving up to 208 GOPS/W, with up to 63% of power spent in the PEs.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Optimization of Sense-and-Set Piezoelectric Energy Harvesting Interface Circuits 感应和设置压电能量收集接口电路的分析与优化
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-20 DOI: 10.1109/TVLSI.2024.3409668
Loai G. Salem
{"title":"Analysis and Optimization of Sense-and-Set Piezoelectric Energy Harvesting Interface Circuits","authors":"Loai G. Salem","doi":"10.1109/TVLSI.2024.3409668","DOIUrl":"10.1109/TVLSI.2024.3409668","url":null,"abstract":"This article presents the modeling and optimization of a sense-and-set (SaS) rectifier. The basic equations governing the operation of a SaS rectifier are derived analytically using Laplace-transform techniques. An expression for the harvesting efficiency of a SaS rectifier is developed by evaluating the conduction and gate-drive losses as well as the output power of the rectifier. The derived expressions are then employed to locate the optimal design point of a SaS interface circuit. The proposed modeling approach reduces the required run time by more than 2000 times as compared to SPICE simulation without sacrificing accuracy. The following design parameters are determined for maximum efficiency: optimal relative size between the rectifier switches, total conductance of the rectifier, and sensing frequency. The close match between the theoretical expressions and circuit simulation results validates the proposed analysis.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dual-Mode Continuous–Time Sigma-Delta Modulator With a Reconfigurable Loop Filter Based on a Single Op-Amp Resonator 基于单运放谐振器的带可重构环路滤波器的双模连续时间Σ-Δ 调制器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-20 DOI: 10.1109/TVLSI.2024.3414298
Young-Kyun Cho
{"title":"A Dual-Mode Continuous–Time Sigma-Delta Modulator With a Reconfigurable Loop Filter Based on a Single Op-Amp Resonator","authors":"Young-Kyun Cho","doi":"10.1109/TVLSI.2024.3414298","DOIUrl":"10.1109/TVLSI.2024.3414298","url":null,"abstract":"This brief proposes a dual-mode continuous-time (CT) sigma-delta modulator (SDM) for switched-mode power supplies comprising a switchable loop filter (LF) based on a single op-amp resonator (SOR). The proposed modulator adaptively adjusts the LF architecture between the third and second order and optimizes the noise transfer function (NTF) using the partial resistors as per the sampling frequency. This facilitates the desired bandwidth and resolution while mitigating design complexity and minimizing the need for tuning circuitry. Moreover, the LF implemented with the SOR enhances both the power and area efficiency of the modulator in each operating mode by reducing the number of active components. The modulator was fabricated based on an 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m CMOS process with an active area of 0.105 mm2. It achieved peak signal-to-noise ratios (SNRs) of 66.0/65.3 dB for signal bandwidths of 0.5/1.1 MHz. The power consumptions were 127/\u0000<inline-formula> <tex-math>$280~mu $ </tex-math></inline-formula>\u0000W from a 1.8-V supply when clocked at 40/160 MHz. The figures of merit for each mode were 82/93 fJ/conv.-step.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Precision Mixed-Computation Models for Inference on Edge 用于边缘推理的低精度混合计算模型
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-20 DOI: 10.1109/TVLSI.2024.3409640
Seyedarmin Azizi;Mahdi Nazemi;Mehdi Kamal;Massoud Pedram
{"title":"Low-Precision Mixed-Computation Models for Inference on Edge","authors":"Seyedarmin Azizi;Mahdi Nazemi;Mehdi Kamal;Massoud Pedram","doi":"10.1109/TVLSI.2024.3409640","DOIUrl":"10.1109/TVLSI.2024.3409640","url":null,"abstract":"This article presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach uses 4-bit Posit (Posit4), which has higher precision around 0, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. In addition, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of an MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out Dataflow 用 ConvFIFO 增强 ConvNets:基于内核静态先进先出数据流的跨条 PIM 架构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-17 DOI: 10.1109/TVLSI.2024.3409648
Yu Qian;Liang Zhao;Fanzi Meng;Xiapeng Xu;Cheng Zhuo;Xunzhao Yin
{"title":"Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out Dataflow","authors":"Yu Qian;Liang Zhao;Fanzi Meng;Xiapeng Xu;Cheng Zhuo;Xunzhao Yin","doi":"10.1109/TVLSI.2024.3409648","DOIUrl":"10.1109/TVLSI.2024.3409648","url":null,"abstract":"Convolutional neural networks (ConvNets) have long been the model of choice for computer vision (CV) problems and gained renewed traction lately. In order to compute ConvNets more efficiently, process-in-memory (PIM) architectures based on emerging non-volatile memories (NVMs) such as RRAM have been widely studied. However, conventional NVM-based PIM suffered from various non-idealities including IR drop, sneak-path currents, large analog-to-digital converter (ADC) overhead, device variations, circuits mismatch, and error propagation. In this work, we propose ConvFIFO, a crossbar-memory-based PIM architecture for ConvNets featuring a kernel-stationary dataflow. Through the design of FIFO-type input and output buffers, smaller row-activation parallelism, and more compact ADCs, ConvFIFO can maximize the reuse rates of inputs and partial sums to achieve a more balanced trade-off among throughput, accuracy, and area/energy consumption. Using SRAM-based FIFO as the input/output buffer, ConvFIFO achieves a systolic architecture without the need to move weight data, bypassing the limitation of NVM endurance and minimizing the movement of partial sums. Moreover, the FIFO nature of the dataflow allows flexible pipeline design and load balancing. Compared to classical NVM-based PIM architectures such as ISAAC, ConvFIFO exhibits significant performance enhancement for various ConvNet models, showing 1.66–\u0000<inline-formula> <tex-math>$1.69times $ </tex-math></inline-formula>\u0000/1.69–\u0000<inline-formula> <tex-math>$1.74times $ </tex-math></inline-formula>\u0000/4.23–\u0000<inline-formula> <tex-math>$4.79times $ </tex-math></inline-formula>\u0000/1.59–\u0000<inline-formula> <tex-math>$1.74times $ </tex-math></inline-formula>\u0000 improvement in terms of energy consumption, latency, Ops/W, and Ops/s\u0000<inline-formula> <tex-math>$times $ </tex-math></inline-formula>\u0000mm2, respectively. Compared to GPUs, ConvFIFO exhibits only an average accuracy loss of 1.82% during inference.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信