IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献_第6页

A Universal Sequential Authentication Scheme for TAPC-Based Test Standards 基于tapc测试标准的通用顺序认证方案

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-29 DOI: 10.1109/TVLSI.2025.3562015

Guan-Rong Chen;Kuen-Jong Lee

{"title":"A Universal Sequential Authentication Scheme for TAPC-Based Test Standards","authors":"Guan-Rong Chen;Kuen-Jong Lee","doi":"10.1109/TVLSI.2025.3562015","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3562015","url":null,"abstract":"Integrated circuits (ICs) have become extremely complex nowadays. Therefore, multiple test standards could be employed to handle different testing scenarios. Unfortunately, this also leads to serious security problems since attackers can exploit the excellent controllability and observability of test standards to steal confidential information or disrupt the circuit’s functionality. This article proposes a universal sequential authentication scheme that is compatible with test standards employing the test access port controller (TAPC) defined in IEEE Std 1149.1. The main objective is to protect multiple TAPC-based test standards with a universal security module. In this scheme, only authorized test data can be updated to the target register to control the corresponding test standard, and only the response to authorized test data can be output. The key idea is to generate different authentication keys for different test data, and even with the same set of test data, if their input sequences are different, their authentication keys will also be different. Furthermore, we develop an irreversible obfuscation mechanism to generate fake output data to confuse attackers. Due to its irreversibility, the original correct output data cannot be deduced from the fake output data. Experimental results on a typical processor, i.e., SCR1, show that the proposed scheme causes no time overhead, and the area overhead is only 1.74%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1972-1982"},"PeriodicalIF":2.8,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel High-Throughput FFT Processor With a Block-Level Pipeline for 5G MIMO OFDM Systems 5G MIMO OFDM系统中一种具有块级管道的新型高吞吐量FFT处理器

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-28 DOI: 10.1109/TVLSI.2025.3558947

Meiyu Liu;Zhijun Wang;Hanqing Luo;Shengnan Lin;Liping Liang

引用次数: 0

A 0.6-V 9.38-Bit 6.9-kS/s Capacitor-Splitting Bypass Window SAR ADC for Wearable 12-Lead ECG Acquisition Systems 用于可穿戴12导联心电采集系统的0.6 v 9.38位6.9 k /s电容分流旁路窗口SAR ADC

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-28 DOI: 10.1109/TVLSI.2025.3559669

Kangkang Sun;Jingjing Liu;Feng Yan;Yuan Ren;Ruihuang Wu;Bingjun Xiong;Zhipeng Li;Jian Guan

{"title":"A 0.6-V 9.38-Bit 6.9-kS/s Capacitor-Splitting Bypass Window SAR ADC for Wearable 12-Lead ECG Acquisition Systems","authors":"Kangkang Sun;Jingjing Liu;Feng Yan;Yuan Ren;Ruihuang Wu;Bingjun Xiong;Zhipeng Li;Jian Guan","doi":"10.1109/TVLSI.2025.3559669","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3559669","url":null,"abstract":"This article proposes a fully differential ten-bit energy-efficient successive approximation register (SAR) analog-to-digital converter (ADC) for wearable 12-lead electrocardiogram (ECG) acquisition system. The proposed ADC structure generates two bypass windows through capacitor splitting technique, which can skip unnecessary quantization steps. The judgment module of bypass windows only requires an <sc>XOR gate. By introducing redundant capacitors to participate in quantization, the total capacitance value is reduced by half. The proposed SAR ADC is fabricated using a standard 180-nm CMOS process. The measurement results show that it can achieve an effective number of bits (ENOBs) of 9.38 bits and a spurious-free dynamic range (SFDR) of 76.71 dB with a supply voltage of 0.6 V at a sampling rate (<inline-formula> <tex-math>$text{F}_{mathrm {S}}$ </tex-math></inline-formula>) of 6.94 kS/s. The power consumption is 15.61 nW when subjected to a 1.17-<inline-formula> <tex-math>$text{V}_{mathrm {PP}}~3.45$ </tex-math></inline-formula>-kHz sinusoidal input, resulting in a figure of merit (FoM) of 3.38 fJ/conv.-step. The average power consumption for quantizing 12-lead ECG signals is approximately 12.66 nW, demonstrating the ability to achieve ultralow-power quantization of ECG signals.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1838-1847"},"PeriodicalIF":2.8,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A High-Density eDRAM Macro With Programmable Sense Amplifier and TG-Shifter for Logical-Instruction-Based In-Memory Computing 具有可编程感测放大器和tg移位器的高密度eDRAM宏用于基于逻辑指令的内存计算

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-28 DOI: 10.1109/TVLSI.2025.3561507

Kunyao Lai;Enyi Yao;Zhenxing Li;Yongkui Yang

{"title":"A High-Density eDRAM Macro With Programmable Sense Amplifier and TG-Shifter for Logical-Instruction-Based In-Memory Computing","authors":"Kunyao Lai;Enyi Yao;Zhenxing Li;Yongkui Yang","doi":"10.1109/TVLSI.2025.3561507","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3561507","url":null,"abstract":"Embedded DRAM (eDRAM) has been widely adopted as on-chip cache memory in modern processors due to its high density. In this article, we propose a 2T gain-cell eDRAM-based macro that functions not only as traditional cache memory but also as an in-memory computing unit capable of performing logic operations. Furthermore, this eDRAM macro features in situ storing, completely eliminating the need for external memory or register access during computation. The sense amplifier in this macro is equipped with a programmable voltage reference, enabling support for various Boolean logic operations, including <sc>and/<sc>nand, <sc>or/<sc>nor, and <sc>not. In addition, the macro integrates a transmission-gate (TG)-based shifter cluster to perform data shifting, which is commonly required in general computations. To enhance functionality, we design an instruction set that supports compound logic computations, allowing Boolean logic, shifting, and in situ storage to be executed within a single instruction. We validated this eDRAM macro in a 32-kb bitcell array using the 40-nm logic CMOS technology. Compared with state-of-the-art designs, our macro achieves a relatively high density of 729.2 kb/mm2 and a competitive logic energy of 14.1 fJ/bit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2069-2073"},"PeriodicalIF":2.8,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CapsBeam: Accelerating Capsule Network-Based Beamformer for Ultrasound Nonsteered Plane-Wave Imaging on Field-Programmable Gate Array CapsBeam：用于现场可编程门阵列超声无操纵平面波成像的加速胶囊网络波束形成器

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-25 DOI: 10.1109/TVLSI.2025.3559403

Abdul Rahoof;Vivek Chaturvedi;Mahesh Raveendranatha Panicker;Muhammad Shafique

{"title":"CapsBeam: Accelerating Capsule Network-Based Beamformer for Ultrasound Nonsteered Plane-Wave Imaging on Field-Programmable Gate Array","authors":"Abdul Rahoof;Vivek Chaturvedi;Mahesh Raveendranatha Panicker;Muhammad Shafique","doi":"10.1109/TVLSI.2025.3559403","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3559403","url":null,"abstract":"In recent years, there has been a growing trend in accelerating computationally complex nonreal-time beamforming algorithms in ultrasound imaging using deep learning models. However, due to the large size and complexity, these state-of-the-art deep learning techniques pose significant challenges when deploying on resource-constrained edge devices. In this work, we propose a novel capsule network-based beamformer called CapsBeam, designed to operate on raw radio frequency data and provide an envelope of beamformed data through nonsteered plane-wave insonification. In experiments on in vivo data, CapsBeam reduced artifacts compared to the standard Delay-and-Sum (DAS) beamforming. For in vitro data, CapsBeam demonstrated a 32.31% increase in contrast, along with gains of 16.54% and 6.7% in axial and lateral resolution compared to the DAS. Similarly, in silico data showed a 26% enhancement in contrast, along with improvements of 13.6% and 21.5% in axial and lateral resolution, respectively, compared to the DAS. To reduce the parameter redundancy and enhance the computational efficiency, we pruned the model using our multilayer look-ahead kernel pruning (LAKP-ML) methodology, achieving a compression ratio of 85% without affecting the image quality. Additionally, the hardware complexity of the proposed model is reduced by applying quantization, simplification of nonlinear operations, and parallelizing operations. Finally, we proposed a specialized accelerator architecture for the pruned and optimized CapsBeam model, implemented on a Xilinx ZU7EV FPGA. The proposed accelerator achieved a throughput of 30 GOPS for the convolution operation and 17.4 GOPS for the dynamic routing operation.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1934-1944"},"PeriodicalIF":2.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路（VLSI）系统学报

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-25 DOI: 10.1109/TVLSI.2025.3557605

引用次数: 0

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE 超大规模集成 (VLSI) 系统论文集出版信息

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-25 DOI: 10.1109/TVLSI.2025.3557603

引用次数: 0

Upscale Layer Acceleration on Existing AI Hardware 现有AI硬件的高级图层加速

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-23 DOI: 10.1109/TVLSI.2025.3558946

Vuk Vranjkovic;Predrag Teodorovic;Rastislav Struharik

{"title":"Upscale Layer Acceleration on Existing AI Hardware","authors":"Vuk Vranjkovic;Predrag Teodorovic;Rastislav Struharik","doi":"10.1109/TVLSI.2025.3558946","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3558946","url":null,"abstract":"Upscaling layers are important components of modern deep learning networks but often pose computational challenges for hardware (HW) accelerators. This article addresses this issue by introducing a novel layer-replacement technique to efficiently process upscaling layers using existing hardware-supported operations like depthwise convolutions and maximum pooling. To minimize the number of replacement layers, we propose an efficient layer number reduction algorithm. Experimental results on four deep neural networks demonstrate a significant speedup ranging from <inline-formula> <tex-math>$1.58times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$32.88times $ </tex-math></inline-formula> compared to the original HW/software (SW) execution approach, and from <inline-formula> <tex-math>$3.65times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$19.21times $ </tex-math></inline-formula> compared to the software-only solution, with minimal hardware overhead (0.068% more field-programmable gate array (FPGA) look-up tables (LUTs)). Notably, our technique introduces no numerical errors and maintains comparable input data processing quality to the original network.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1624-1637"},"PeriodicalIF":2.8,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DMSA: An Efficient Architecture for Sparse–Sparse Matrix Multiplication Based on Distribute-Merge Product Dataflow DMSA：一种基于分布-合并产品数据流的稀疏-稀疏矩阵乘法的高效架构

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-23 DOI: 10.1109/TVLSI.2025.3558895

Yuta Nagahara;Jiale Yan;Kazushi Kawamura;Daichi Fujiki;Masato Motomura;Thiem Van Chu

{"title":"DMSA: An Efficient Architecture for Sparse–Sparse Matrix Multiplication Based on Distribute-Merge Product Dataflow","authors":"Yuta Nagahara;Jiale Yan;Kazushi Kawamura;Daichi Fujiki;Masato Motomura;Thiem Van Chu","doi":"10.1109/TVLSI.2025.3558895","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3558895","url":null,"abstract":"The sparse–sparse matrix multiplication (SpMSpM) is a fundamental operation in various applications. Existing SpMSpM accelerators based on inner product (IP) and outer product (OP) suffer from low computational efficiency and high memory traffic due to inefficient index matching and merging overheads. Gustavson’s product (GP)-based accelerators mitigate some of these challenges but struggle with workload imbalance and irregular memory access patterns, limiting computational parallelism. To overcome these limitations, we propose a distribute-merge product (DMP), a novel SpMSpM dataflow that evenly distributes workloads across multiple computation streams and merges partial results efficiently. We design and implement DMP-based SpMSpM architecture (DMSA), incorporating four key techniques to fully exploit the parallelism of DMP and efficiently handle irregular memory accesses. Implemented on a Xilinx ZCU106 FPGA, DMSA achieves speedups of up to <inline-formula> <tex-math>$3.38times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.73times $ </tex-math></inline-formula> over two state-of-the-art FPGA-based SpMSpM accelerators while maintaining comparable hardware resource usage. In addition, compared to CPU and GPU implementations on an NVIDIA Jetson AGX Xavier, DMSA is <inline-formula> <tex-math>$4.96times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.53times $ </tex-math></inline-formula> faster while achieving <inline-formula> <tex-math>$6.67times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$2.33times $ </tex-math></inline-formula> better energy efficiency, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1858-1871"},"PeriodicalIF":2.8,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An 197-μJ/Frame Single-Frame Bundle Adjustment Hardware Accelerator for Mobile Visual Odometry 197 μ j /帧单帧束调整硬件加速器

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-22 DOI: 10.1109/TVLSI.2025.3557872

Cheng Nian;Xiaorui Mo;Weiyi Zhang;Fasih Ud Din Farrukh;Yushi Guo;Fei Chen;Chun Zhang

{"title":"An 197-μJ/Frame Single-Frame Bundle Adjustment Hardware Accelerator for Mobile Visual Odometry","authors":"Cheng Nian;Xiaorui Mo;Weiyi Zhang;Fasih Ud Din Farrukh;Yushi Guo;Fei Chen;Chun Zhang","doi":"10.1109/TVLSI.2025.3557872","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3557872","url":null,"abstract":"This article presents an energy-efficient hardware accelerator for optimized bundle adjustment (BA) for mobile high-frame-rate visual odometry (VO). BA uses graph optimization techniques to optimize poses and landmarks and the applications are robot navigation, virtual reality (VR), and augmented reality (AR). Existing software implementations of BA optimization involve complex computational flows, numerical calculations, Lie group, and Lie algebra conversions. This poses challenges of slow computational speeds and high power consumption. A two-level reuse hardware architecture is proposed and implemented that efficiently updates the Jacobian matrix while reducing the field-programmable gate array (FPGA) hardware resources by 25%. A set of methodologies is proposed to quantify the errors caused by fixed-point systems during optimization. A fully pipelined architecture is implemented to increase computational speed while reducing hardware resources by 29%. This design features a parallel equation solver that improves processing speed by <inline-formula> <tex-math>$2times $ </tex-math></inline-formula> compared to conventional approaches. This article employs a single-frame local BA VO on the KITTI dataset and EuRoC dataset, achieving an average translational error of 0.75% and a rotational error of <inline-formula> <tex-math>$0.0028~^{circ } $ </tex-math></inline-formula>/m. The proposed hardware achieves a performance ranging from 188 to 345 frames/s in optimizing two main feature extraction methods with a maximum of 512 extracted feature points. Compared to state-of-the-art implementations, the accelerator achieved a minimum energy efficiency ratio of 11.6 mJ and <inline-formula> <tex-math>$191~mu $ </tex-math></inline-formula>J on the FPGA platform and application-specific integrated circuits (ASICs) platform, respectively. These improvements underscore the potential of FPGAs to enhance VO systems’ adaptability and efficiency in complex environments.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1872-1885"},"PeriodicalIF":2.8,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0