IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献_第5页

Architectures for Serial and Parallel Pipelined NTT-Based Polynomial Modular Multiplication 基于ntt的串行和并行流水线多项式模乘法体系结构

IF 3.1 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-11 DOI: 10.1109/TVLSI.2025.3576782

Sin-Wei Chiu;Keshab K. Parhi

{"title":"Architectures for Serial and Parallel Pipelined NTT-Based Polynomial Modular Multiplication","authors":"Sin-Wei Chiu;Keshab K. Parhi","doi":"10.1109/TVLSI.2025.3576782","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3576782","url":null,"abstract":"Quantum computers pose a significant threat to modern cryptographic systems by efficiently solving problems such as integer factorization through Shor’s algorithm. Homomorphic encryption (HE) schemes based on ring learning with errors (Ring-LWE) offer a quantum-resistant framework for secure computations on encrypted data. Many of these schemes rely on polynomial multiplication, which can be efficiently accelerated using the number theoretic transform (NTT) in leveled HE, ensuring practical performance for privacy-preserving applications. This article presents a novel NTT-based serial pipelined multiplier that achieves full-hardware utilization through interleaved folding, and overcomes the 50% under-utilization limitation of the conventional serial R2MDC architecture. In addition, it explores tradeoffs in pipelined parallel designs, including serial, 2-parallel, and 4-parallel architectures. Our designs leverage increased parallelism, efficient folding techniques, and optimizations for a selected constant modulus to achieve superior throughput (TP) compared with state-of-the-art implementations. While the serial fold design minimizes area consumption, the 4-parallel design maximizes TP. Experimental results on the Virtex-7 platform demonstrate that our architectures achieve at least 2.22 times higher TP/area for a polynomial length of 1024 and 1.84 times for a polynomial length of 4096 in the serial fold design, while the 4-parallel design achieves at least 2.78 times and 2.79 times, respectively. The efficiency gain is even more pronounced in TP squared over area, where the serial fold and 4-parallel designs outperform prior works by at least 4.98 times and 26.43 times for a polynomial length of 1024 and 6.7 times and 43.77 times for a polynomial length of 4096, respectively. These results highlight the effectiveness of our architectures in balancing performance, area efficiency, and flexibility, making them well-suited for high-speed cryptographic applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2474-2487"},"PeriodicalIF":3.1,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Soft Iterative Receiver With Simplified EP Detection for Coded MIMO Systems 编码MIMO系统中简化EP检测的软迭代接收机

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-09 DOI: 10.1109/TVLSI.2025.3536019

Xiaosi Tan;Xiaohua Xie;Houren Ji;Tiancan Xia;Yongming Huang;Xiaohu You;Chuan Zhang

{"title":"A Soft Iterative Receiver With Simplified EP Detection for Coded MIMO Systems","authors":"Xiaosi Tan;Xiaohua Xie;Houren Ji;Tiancan Xia;Yongming Huang;Xiaohu You;Chuan Zhang","doi":"10.1109/TVLSI.2025.3536019","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3536019","url":null,"abstract":"Expectation propagation (EP) achieves excellent performance with high-order modulation in massive multiple-input multiple-output (MIMO) detection. The soft output of the EP detector can be iteratively combined with turbo soft decoders to enhance error-correction performance. However, the implementation of EP-based iterative detection and decoding (IDD) receivers suffer from an exponential increase in computational complexity as the number of antennas and modulation order grows. In this brief, we propose a simplified EP approximation-based IDD (sEPA-IDD) scheme for hardware implementation. To alleviate the computational burden, a simplified message update scheme is proposed, reducing complexity by 68% without performance degradation. Additionally, a unified design for extrinsic message computation further improves hardware utilization. Finally, we introduce the first unfolded EP-based IDD architecture to boost throughput. Compared with state-of-the-art (SOA) IDD receivers, the sEPA-IDD receiver implemented on 65 nm CMOS delivers a throughput of 3.07 Gb/s with a maximum 0.5 dB gain, achieving <inline-formula> <tex-math>$4.03times $ </tex-math></inline-formula> higher throughput and <inline-formula> <tex-math>$6.04times $ </tex-math></inline-formula> greater area efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1994-1998"},"PeriodicalIF":2.8,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RHT_NoC: A Reconfigurable Hybrid Topology Architecture for Chiplet-Based Multicore System RHT_NoC：基于芯片的多核系统的可重构混合拓扑结构

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-05 DOI: 10.1109/TVLSI.2025.3572112

Dongyu Xu;Wu Zhou;Zhengfeng Huang;Huaguo Liang;Xiaoqing Wen

{"title":"RHT_NoC: A Reconfigurable Hybrid Topology Architecture for Chiplet-Based Multicore System","authors":"Dongyu Xu;Wu Zhou;Zhengfeng Huang;Huaguo Liang;Xiaoqing Wen","doi":"10.1109/TVLSI.2025.3572112","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3572112","url":null,"abstract":"Chiplet-based system-on-chip (SoC) architectures, leveraging 2.5-D/3-D integration technologies, provide scalable solutions for a wide range of applications. Achieving high performance and cost-effectiveness in these systems relies heavily on optimizing die-to-die interconnect topologies and designs, which are essential for seamless interchiplet communication. This article introduces a reconfigurable hybrid topology (RHT) architecture designed for chiplet-based multicore systems. RHT achieves high performance and energy efficiency by dynamically reconfiguring the network topology to traffic variations, adaptively selecting transport subnets, and optimizing link bandwidth allocation, thereby minimizing congestion and maximizing packet throughput. Furthermore, RHT leverages global traffic information to dynamically combine Torus loops, maximizing opportunities for rapid packet transmission delivery while guaranteeing minimal hop counts. Moreover, RHT accelerates packet transmission via bufferless combined loops, extending the continuous sleeping periods of routers, improves power gating efficiency, and significantly reduces static power consumption. Simulation results indicate that the Mesh-DyRing achieves over a 40% reduction in network latency and more than a 20% decrease in power consumption overhead compared to the baseline design. When compared to WiNoC, an advanced hybrid wired-wireless topology design, the Mesh-DyRing-PG configuration reduces power consumption by 56.2% while maintaining equivalent average network latency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2104-2117"},"PeriodicalIF":2.8,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations 基于gem操作的可扩展FPGA架构与自适应内存利用

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-03 DOI: 10.1109/TVLSI.2025.3571677

Anastasios Petropoulos;Theodore Antonakopoulos

引用次数: 0

Design of a Low-Power Analog Integrated Deep Convolutional Neural Network 低功耗模拟集成深度卷积神经网络的设计

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-03 DOI: 10.1109/TVLSI.2025.3573045

Zisis Foufas;Vassilis Alimisis;Paul P. Sotiriadis

{"title":"Design of a Low-Power Analog Integrated Deep Convolutional Neural Network","authors":"Zisis Foufas;Vassilis Alimisis;Paul P. Sotiriadis","doi":"10.1109/TVLSI.2025.3573045","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3573045","url":null,"abstract":"In this article, a framework for the analog implementation of a deep convolutional neural network (CNN) is introduced and used to derive a new circuit architecture which is composed of an improved analog multiplier and circuit blocks implementing the ReLU activation function and the argmax operator. The operating principles of the individual blocks, as well as those of the complete architecture, are analyzed and used to realize a low-power analog classifier, consuming less than <inline-formula> <tex-math>$1.8~mu text {W}$ </tex-math></inline-formula>. The proper operation of the classifier is verified via a comparison with a software equivalent implementation and its performance is evaluated against existing circuit architectures. The proposed architecture is implemented in a TSMC 90-nm CMOS process and simulated using Cadence IC Suite for both schematic and layout design. Corner and Monte Carlo mismatch simulations of the schematic and the physical circuit (postlayout) were conducted to evaluate the effect of transistor mismatches and process voltage temperature (PVT) variations and to showcase a proposed systematic method for offsetting their effect.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2172-2185"},"PeriodicalIF":2.8,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Speed Compute-Efficient Bandit Learning for Many Arms 多兵种高速计算高效强盗学习

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-03 DOI: 10.1109/TVLSI.2025.3573924

Ishaan Sharma;Sumit J. Darak;Rohit Kumar

引用次数: 0

A Compact High-Speed Capacitive Data Transfer Link With Common Mode Transient Rejection for Isolated Sensor Interfaces 用于隔离传感器接口的具有共模瞬态抑制的紧凑型高速电容性数据传输链路

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-03 DOI: 10.1109/TVLSI.2025.3573226

Isa Altoobaji;Ahmad Hassan;Mohamed Ali;Yves Audet;Ahmed Lakhssassi

{"title":"A Compact High-Speed Capacitive Data Transfer Link With Common Mode Transient Rejection for Isolated Sensor Interfaces","authors":"Isa Altoobaji;Ahmad Hassan;Mohamed Ali;Yves Audet;Ahmed Lakhssassi","doi":"10.1109/TVLSI.2025.3573226","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3573226","url":null,"abstract":"In this article, a compact differential data transfer link architecture for isolated sensor interfaces (SIs) and immune to common mode transients (CMTs) is presented. The proposed architecture shows low latency supporting high-speed transmission with a low bit error rate (BER) in the presence of CMT noise for applications, such as data acquisition, biomedical equipment, and communication networks. In transportation applications, motors and actuators are subjected to harsh environmental conditions, e.g., lightning strikes and abnormal voltage operations. These conditions introduce noise and can cause damage to small electronics due to high-voltage power surges. To ensure human safety and circuitry protection, a data transfer system must be implemented between high-voltage and low-voltage domains. The proposed design has been simulated using Cadence tools, and a prototype has been manufactured in a 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS process. The fabricated prototype consumes an effective silicon area of <inline-formula> <tex-math>$37.2times 10^{3}~mu $ </tex-math></inline-formula>m2 and can sustain a breakdown voltage of 710 Vrms. Experimental results show that the proposed solution achieves a CMT immunity (CMTI) of 2.5 kV/<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>s at a data rate of 480 Mb/s with a BER of <inline-formula> <tex-math>$10^{-12}$ </tex-math></inline-formula>. The propagation delay is 3.9 ns with a 4 ps/°C variation rate over temperatures ranging from <inline-formula> <tex-math>$- 31~^{circ }$ </tex-math></inline-formula>C to <inline-formula> <tex-math>$100~^{circ }$ </tex-math></inline-formula>C. Under typical test conditions, the BER reaches <inline-formula> <tex-math>$10^{-15}$ </tex-math></inline-formula> with a peak-to-peak data dependent jitter (DDJ) of 29.8 ps.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2163-2171"},"PeriodicalIF":2.8,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward High-Performance Network Coding: FPGA Acceleration With Bounded-Value Generators 迈向高性能网络编码：FPGA加速与有界值生成器

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-02 DOI: 10.1109/TVLSI.2025.3572517

Jiaxin Qing;Philip H. W. Leong;Kin-Hong Lee;Raymond W. Yeung

引用次数: 0

A Compact Power-on-Reset Circuit With Configurable Brown-Out Detection 一个紧凑的电源上电复位电路与可配置的停电检测

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-30 DOI: 10.1109/TVLSI.2025.3561131

Yoochang Kim;Jun-Eun Park;Kwanseo Park;Young-Ha Hwang

{"title":"A Compact Power-on-Reset Circuit With Configurable Brown-Out Detection","authors":"Yoochang Kim;Jun-Eun Park;Kwanseo Park;Young-Ha Hwang","doi":"10.1109/TVLSI.2025.3561131","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3561131","url":null,"abstract":"A compact power-on-reset (POR) circuit with a configurable brown-out reset (BOR) function is presented. An integrated voltage reference (VR) circuit provides a constant bias voltage that facilitates voltage-triggered POR/BOR operation, reliably preventing POR signal generation when the ramping supply voltage (<inline-formula> <tex-math>${V} _{text {DD}}$ </tex-math></inline-formula>) level is too low. Moreover, the proposed POR circuit features a fast, configurable POR/BOR operation owing to an inverter-based trip point detector (TPD), which triggers the reset signal with a programmable trip point. The prototype POR circuit achieves a POR level higher than 752 mV with a maximum POR delay of <inline-formula> <tex-math>$16.4~mu $ </tex-math></inline-formula>s at a 0.8–1.2-V <inline-formula> <tex-math>${V} _{text {DD}}$ </tex-math></inline-formula>, supporting a wide range of supply ramping time from <inline-formula> <tex-math>$1~mu $ </tex-math></inline-formula>s to 1 s. In addition, the prototype detects brown-out events with a supply drop of 0.1–0.4 V, generating the BOR signal. Designed using a 28-nm CMOS process, the prototype has a compact active area of <inline-formula> <tex-math>$995.3~mu $ </tex-math></inline-formula>m2 and a quiescent current of 162–974 nA at a 1-V <inline-formula> <tex-math>${V} _{text {DD}}$ </tex-math></inline-formula>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2074-2078"},"PeriodicalIF":2.8,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Partial Recomputation-Based Fault Detection Approaches for Z-transform 基于局部重计算的z变换故障检测方法

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-30 DOI: 10.1109/TVLSI.2025.3560154

Saeed Aghapour;Kasra Ahmadi;Mehran Mozaffari Kermani;Reza Azarderakhsh

{"title":"Efficient Partial Recomputation-Based Fault Detection Approaches for Z-transform","authors":"Saeed Aghapour;Kasra Ahmadi;Mehran Mozaffari Kermani;Reza Azarderakhsh","doi":"10.1109/TVLSI.2025.3560154","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3560154","url":null,"abstract":"The Z-transform is a fundamental and strong tool being widely utilized in signal processing and various other applications such as communications and networking. By analyzing the Z-transform of a signal, one can extract critical information about its stability, causality, frequency response, energy and power, and overall behavior of the signal. However, errors caused either by environmental changes or malicious injections in large-scale integration (VLSI) implementations can critically compromise the integrity and reliability of its output. Failure to detect such faults may result in unpredictable, erroneous, and misleading function analyses. Therefore, the ability to detect soft errors and faults before accepting the results is of paramount importance. In this article, we propose an efficient fault detection method that combines algorithmic-level checks with partial recomputation to identify both transient and permanent faults with a high error coverage rate across various injection scenarios. The AMD/Xilinx field-programmable gate array (FPGA) implementation of our design demonstrated only a modest increase in time and area overhead. To the best of our knowledge, fault detection for the Z-transform function has not been previously studied.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1983-1993"},"PeriodicalIF":2.8,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0