Jae-Youn Hong , Je-Woo Jang , Sung-Hyuk Cho , Youngbae Kong , Sungkyu Kim , Youngjung Kang , Jaehyung Ko , Jaeyong Chung , Joon-Sung Yang
{"title":"减少DNN推理中LPDDR的误差和功率:一种基于压缩和iec的方法","authors":"Jae-Youn Hong , Je-Woo Jang , Sung-Hyuk Cho , Youngbae Kong , Sungkyu Kim , Youngjung Kang , Jaehyung Ko , Jaeyong Chung , Joon-Sung Yang","doi":"10.1016/j.sysarc.2025.103409","DOIUrl":null,"url":null,"abstract":"<div><div>In modern edge systems, the demand for data processing, especially for complex DNN tasks, is rapidly increasing. To address this, various compression schemes have been proposed to enable on-device AI while meeting the strict power and storage constraints of edge devices. However, despite these advancements, the compatibility of the compression methods with edge device memory, such as LPDDR, has not been thoroughly investigated. LPDDR operates at low voltage and faces reliability challenges like cell leakage, which is particularly concerning for applications where accuracy is critical, such as Advanced Driver Assistance Systems (ADAS) or medical devices. To address these reliability concerns, an ECC engine, known as IECC, is employed within each LPDDR bank. While IECC improves reliability, it also incurs performance penalties due to Read-Modify-Write (RMW) operations and parity storage overheads. This paper introduces RELIA, a DNN weight compression scheme with three-stage protection, aimed at enabling power-efficient and reliable DNN operations in mobile environments. RELIA reduces the operation granularity of the IECC engine to eliminate RMW overhead. Additionally, it proposes a SEC-FOEC(72,64) scheme (Single Error Correction-Frequently Occurring Error Correction) that can correct 99.97% of LPDDR errors. To mitigate the added storage overhead, a compression scheme based on DNN weight characteristics is introduced. Experimental results show RELIA outperforms traditional IECC schemes, reducing power by 16.12%, cycles by 12.6%, energy by 30.62%, and storage by 22.78%, while offering superior reliability in DNN inference.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103409"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reducing Errors and Powers in LPDDR for DNN Inference: A Compression and IECC-Based Approach\",\"authors\":\"Jae-Youn Hong , Je-Woo Jang , Sung-Hyuk Cho , Youngbae Kong , Sungkyu Kim , Youngjung Kang , Jaehyung Ko , Jaeyong Chung , Joon-Sung Yang\",\"doi\":\"10.1016/j.sysarc.2025.103409\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In modern edge systems, the demand for data processing, especially for complex DNN tasks, is rapidly increasing. To address this, various compression schemes have been proposed to enable on-device AI while meeting the strict power and storage constraints of edge devices. However, despite these advancements, the compatibility of the compression methods with edge device memory, such as LPDDR, has not been thoroughly investigated. LPDDR operates at low voltage and faces reliability challenges like cell leakage, which is particularly concerning for applications where accuracy is critical, such as Advanced Driver Assistance Systems (ADAS) or medical devices. To address these reliability concerns, an ECC engine, known as IECC, is employed within each LPDDR bank. While IECC improves reliability, it also incurs performance penalties due to Read-Modify-Write (RMW) operations and parity storage overheads. This paper introduces RELIA, a DNN weight compression scheme with three-stage protection, aimed at enabling power-efficient and reliable DNN operations in mobile environments. RELIA reduces the operation granularity of the IECC engine to eliminate RMW overhead. Additionally, it proposes a SEC-FOEC(72,64) scheme (Single Error Correction-Frequently Occurring Error Correction) that can correct 99.97% of LPDDR errors. To mitigate the added storage overhead, a compression scheme based on DNN weight characteristics is introduced. Experimental results show RELIA outperforms traditional IECC schemes, reducing power by 16.12%, cycles by 12.6%, energy by 30.62%, and storage by 22.78%, while offering superior reliability in DNN inference.</div></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"163 \",\"pages\":\"Article 103409\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762125000815\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000815","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
在现代边缘系统中,对数据处理的需求正在迅速增加,特别是对于复杂的深度神经网络任务。为了解决这个问题,已经提出了各种压缩方案,以实现设备上的AI,同时满足边缘设备的严格功率和存储限制。然而,尽管有这些进步,压缩方法与边缘设备存储器(如LPDDR)的兼容性尚未得到彻底的研究。LPDDR在低电压下工作,面临电池泄漏等可靠性挑战,这对于高级驾驶辅助系统(ADAS)或医疗设备等精度要求很高的应用尤其令人担忧。为了解决这些可靠性问题,在每个LPDDR银行中使用了一个称为IECC的ECC引擎。虽然IECC提高了可靠性,但由于读-修改-写(RMW)操作和奇偶存储开销,它也会带来性能损失。本文介绍了一种具有三级保护的深度神经网络权重压缩方案RELIA,旨在实现移动环境下的高效节能和可靠的深度神经网络运行。RELIA减少了IECC引擎的操作粒度,以消除RMW开销。此外,提出了一种SEC-FOEC(72,64)方案(Single Error Correction- frequency occurrence Error Correction),可以纠正99.97%的LPDDR错误。为了减少额外的存储开销,引入了一种基于深度神经网络权值特征的压缩方案。实验结果表明,RELIA优于传统的IECC方案,功耗降低16.12%,周期减少12.6%,能量减少30.62%,存储减少22.78%,同时在DNN推理中提供了优越的可靠性。
Reducing Errors and Powers in LPDDR for DNN Inference: A Compression and IECC-Based Approach
In modern edge systems, the demand for data processing, especially for complex DNN tasks, is rapidly increasing. To address this, various compression schemes have been proposed to enable on-device AI while meeting the strict power and storage constraints of edge devices. However, despite these advancements, the compatibility of the compression methods with edge device memory, such as LPDDR, has not been thoroughly investigated. LPDDR operates at low voltage and faces reliability challenges like cell leakage, which is particularly concerning for applications where accuracy is critical, such as Advanced Driver Assistance Systems (ADAS) or medical devices. To address these reliability concerns, an ECC engine, known as IECC, is employed within each LPDDR bank. While IECC improves reliability, it also incurs performance penalties due to Read-Modify-Write (RMW) operations and parity storage overheads. This paper introduces RELIA, a DNN weight compression scheme with three-stage protection, aimed at enabling power-efficient and reliable DNN operations in mobile environments. RELIA reduces the operation granularity of the IECC engine to eliminate RMW overhead. Additionally, it proposes a SEC-FOEC(72,64) scheme (Single Error Correction-Frequently Occurring Error Correction) that can correct 99.97% of LPDDR errors. To mitigate the added storage overhead, a compression scheme based on DNN weight characteristics is introduced. Experimental results show RELIA outperforms traditional IECC schemes, reducing power by 16.12%, cycles by 12.6%, energy by 30.62%, and storage by 22.78%, while offering superior reliability in DNN inference.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.