{"title":"ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory-Based Ternary LLMs","authors":"Akul Malhotra;Sumeet Kumar Gupta","doi":"10.1109/TVLSI.2025.3585043","DOIUrl":null,"url":null,"abstract":"Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2518-2527"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11073133/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.