IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献_第3页

ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory-Based Ternary LLMs 利用自然冗余和符号转换来增强基于内存计算的三元llm的容错性

IF 3.1 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-07-08 DOI: 10.1109/TVLSI.2025.3585043

Akul Malhotra;Sumeet Kumar Gupta

{"title":"ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory-Based Ternary LLMs","authors":"Akul Malhotra;Sumeet Kumar Gupta","doi":"10.1109/TVLSI.2025.3585043","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585043","url":null,"abstract":"Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2518-2527"},"PeriodicalIF":3.1,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPGA-Oriented Design and Efficient Implementation of a Geometrically Tunable Multiscroll Conservative Chaotic System Without Equilibrium Points 无平衡点几何可调谐多涡旋保守混沌系统的fpga设计与高效实现

IF 3.1 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-07-04 DOI: 10.1109/TVLSI.2025.3580266

Yerui Guang;Qun Ding;Dongxu Liu

{"title":"FPGA-Oriented Design and Efficient Implementation of a Geometrically Tunable Multiscroll Conservative Chaotic System Without Equilibrium Points","authors":"Yerui Guang;Qun Ding;Dongxu Liu","doi":"10.1109/TVLSI.2025.3580266","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3580266","url":null,"abstract":"Although multiscroll conservative chaotic systems exhibit rich dynamical characteristics and hold great potential for secure communications, existing designs generally suffer from limited controllability and low hardware implementation efficiency. To address these challenges, this article proposes a novel 4-D multiscroll conservative chaotic system based on a nonlinear feedback structure constructed using the floor function. This original approach simplifies the system’s logical structure, facilitating efficient hardware modeling while enabling flexible control over the number, amplitude, and spatial distribution of scrolls in 3-D space. The system’s high complexity and coexisting behaviors are validated through dynamical analyses, including equilibrium point analysis, Poincaré sections, and Lyapunov exponents (LEs). To achieve efficient deployment of the chaotic system on field-programmable gate array (FPGA) platforms, this article first simplifies the hardware implementation logic of the feedback structure through the design of an algorithmic model based on bitwise operations. Subsequently, precise control of the system’s module signals is achieved through a finite state machine (FSM) design. The results of the resource comparison analysis indicate that the proposed model achieves a high throughput of 10.08 Gbps while consuming only 1051 look-up tables (LUTs). The lower energy efficiency is 0.0264 mW/Mbps. Hardware-software co-simulation and oscilloscope visual output confirm the numerical precision and hardware feasibility of the proposed system. Finally, this system is integrated with the ZUC stream cipher to construct a novel encryption core, enabling asynchronous ciphertext transmission as well as encryption and decryption functions, thereby demonstrating its potential for secure hardware applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2528-2541"},"PeriodicalIF":3.1,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Secure-by-Design Hardware/Operating System as a Substrate for Trustworthy Computing 一种设计安全的硬件/操作系统作为可信计算的基础

IF 3.1 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-07-03 DOI: 10.1109/TVLSI.2025.3579484

Sebastian Haas;Christopher Dunkel;Friedrich Pauls;Mattis Hasler;Yogesh Verma;Nilanjana Das;Michael Raitza

{"title":"A Secure-by-Design Hardware/Operating System as a Substrate for Trustworthy Computing","authors":"Sebastian Haas;Christopher Dunkel;Friedrich Pauls;Mattis Hasler;Yogesh Verma;Nilanjana Das;Michael Raitza","doi":"10.1109/TVLSI.2025.3579484","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579484","url":null,"abstract":"Nowadays, digital devices like sensors, cell phones, and home servers are deeply embedded in our world to make our daily lives easier. Since we heavily rely on these systems, it is crucial to guarantee their correct functionality and to ensure security and privacy properties. As systems become increasingly complex, it is difficult to maintain security since it necessitates a thorough understanding of all functionalities in hardware and software. Complexity may lead to vulnerabilities that malicious components can exploit. These components can compromise security features provided by the processing cores and the operating system (OS), jeopardizing the overall trustworthiness of the system. In this article, we provide a secure-by-default hardware/OS co-design to build a substrate for trustworthy computing in digital devices. The design is based on a tiled architecture that can integrate untrusted hardware components. Instead of relying on isolation mechanisms of potentially malicious components, isolation is achieved by dedicated and independent hardware components called trusted communication units (TCUs). By keeping the attack surface small and isolating all components by default, malicious hardware and software are restricted in access permissions and, hence, cannot easily break the system’s security. We implemented a TCU-based multiprocessor architecture in a silicon research chip, called Masur23, and ran transfer workloads and selected portions of the microkernel-based OS M<sup>3</sup>. Our measurements demonstrate the feasibility of such a hardware/OS co-design for trustworthy computing. Compared to the entire chip implementation, security features require minimal latency, area, and power consumption overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 10","pages":"2862-2872"},"PeriodicalIF":3.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stochastic Belief Propagation-Based Iterative Detection and Decoding for MIMO Systems 基于随机信念传播的MIMO系统迭代检测与解码

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-07-02 DOI: 10.1109/TVLSI.2024.3477963

Muhao Li;Houren Ji;Xiaosi Tan;Chuan Zhang

引用次数: 0

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE超大规模集成电路（VLSI）系统学报

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-30 DOI: 10.1109/TVLSI.2025.3579662

引用次数: 0

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路（VLSI）系统学报

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-30 DOI: 10.1109/TVLSI.2025.3579664

引用次数: 0

Enhancing Memory BIST With an Optimized RTL-BIST IP Core: A Low-Power, High-Fault-Coverage Approach 用优化的RTL-BIST IP核增强内存BIST：一种低功耗、高故障覆盖率的方法

IF 3.1 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-27 DOI: 10.1109/TVLSI.2025.3581296

Ming-Yi Lin;Wei-Kuan Chiang;Chin-Hung Wang

引用次数: 0

A Fast Floating-Point Multiply–Accumulator Optimized for Sparse Linear Algebra on FPGAs 基于fpga的稀疏线性代数快速浮点乘加器优化

IF 3.1 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-23 DOI: 10.1109/TVLSI.2025.3578619

Kun Li;Xiangyu Hao;Zhenguo Ma;Feng Yu;Bo Zhang;Qianjian Xing

引用次数: 0

RISC-V-Based GPGPU With Vector Capabilities for High-Performance Computing 基于risc - v的GPGPU，具有高性能计算的矢量能力

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-23 DOI: 10.1109/TVLSI.2025.3574427

Jingzhou Li;Fangfei Yu;Mingyuan Ma;Wei Liu;Yuhan Wang;Hualin Wu;Hu He

{"title":"RISC-V-Based GPGPU With Vector Capabilities for High-Performance Computing","authors":"Jingzhou Li;Fangfei Yu;Mingyuan Ma;Wei Liu;Yuhan Wang;Hualin Wu;Hu He","doi":"10.1109/TVLSI.2025.3574427","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3574427","url":null,"abstract":"General-purpose graphics processing units (GPGPUs) have become a leading platform for accelerating modern compute-intensive applications, such as large language models and generative artificial intelligence (AI). However, the lack of advanced open-source GPGPU microarchitectures has hindered high-performance research in this area. In this article, we present Ventus, a high-performance open-source GPGPU implementation built upon the RISC-V architecture with vector extension [RISC-V vector (RVV)]. Ventus introduces customized instructions and a comprehensive software toolchain to optimize performance. We deployed the design on a field programmable gate array (FPGA) platform consisting of 4 Xilinx VU19P devices, scaling up to 16 streaming multiprocessors (SMs) and supporting 256 warps. Experimental results demonstrate that Ventus exhibits key performance features comparable to commercial GPGPUs, achieving an average of 83.9% instruction reduction and 87.4% cycle per instruction (CPI) improvement over the leading open-source alternatives. Under 4-, 8-, and 16-thread configurations, Ventus maintains robust instruction per cycle (IPC) performance with values of 0.47, 0.40, and 0.32, respectively. In addition, the tensor core of Ventus attains an extra average reduction of 69.1% in instruction count and a 68.4% cycle reduction ratio when running AI-related workloads. These findings highlight Ventus as a promising solution for future high-performance GPGPU research and development, offering a robust open-source alternative to proprietary solutions. Ventus can be found on <uri>https://github.com/THU-DSP-LAB/ventus-gpgpu</uri>","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2239-2251"},"PeriodicalIF":2.8,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Sample-and-Hold-Based 453-ps True Time Delay Circuit With a Wide Bandwidth of 0.5–2.5 GHz in 65-nm CMOS 基于采样保持的453-ps真延时电路，带宽为0.5-2.5 GHz

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-20 DOI: 10.1109/TVLSI.2025.3578959

Chuanjie Chen;Xiangyu Meng;Wang Xie;Baoyong Chi

引用次数: 0