AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators

IF 2.5 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Mahdi Taheri;Natalia Cherezova;Samira Nazari;Ali Azarpeyvand;Tara Ghasempouri;Masoud Daneshtalab;Jaan Raik;Maksim Jenihhin
{"title":"AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators","authors":"Mahdi Taheri;Natalia Cherezova;Samira Nazari;Ali Azarpeyvand;Tara Ghasempouri;Masoud Daneshtalab;Jaan Raik;Maksim Jenihhin","doi":"10.1109/TDMR.2024.3523386","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN) hardware accelerators are essential in a spectrum of safety-critical edge-AI applications with stringent reliability, energy efficiency, and latency requirements. Multiplication is the most resource-hungry operation in the neural network’s processing elements. This paper proposes a scalable adaptive fault-tolerant approximate multiplier (AdAM) tailored for ASIC-based DNN accelerators at the algorithm and circuit levels. AdAM employs an adaptive adder that relies on an unconventional use of input Leading One Detector (LOD) values for fault detection by optimizing unutilized adder resources. A gate-level optimized LOD design and a hybrid adder design are also proposed as a part of the adaptive multiplier to improve the hardware performance. The proposed architecture uses a lightweight fault mitigation technique that sets the detected faulty bits to zero. The hardware resource utilization and the DNN accelerator’s reliability metrics are used to compare the proposed solution against the Triple Modular Redundancy (TMR) in multiplication, unprotected exact multiplication, and unprotected approximate multiplication. It is demonstrated that the proposed architecture enables a multiplication with a reliability level close to the multipliers protected by TMR while at the same time utilizing <inline-formula> <tex-math>$2.74 \\times $ </tex-math></inline-formula> less area and with 39.06% less power-delay product compared to the exact multiplier. Moreover, it has similar area, delay, and power consumption parameters compared to the state-of-the-art approximate multipliers with similar accuracy while providing fault detection and mitigation capability.","PeriodicalId":448,"journal":{"name":"IEEE Transactions on Device and Materials Reliability","volume":"25 1","pages":"66-75"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Device and Materials Reliability","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10816697/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Deep Neural Network (DNN) hardware accelerators are essential in a spectrum of safety-critical edge-AI applications with stringent reliability, energy efficiency, and latency requirements. Multiplication is the most resource-hungry operation in the neural network’s processing elements. This paper proposes a scalable adaptive fault-tolerant approximate multiplier (AdAM) tailored for ASIC-based DNN accelerators at the algorithm and circuit levels. AdAM employs an adaptive adder that relies on an unconventional use of input Leading One Detector (LOD) values for fault detection by optimizing unutilized adder resources. A gate-level optimized LOD design and a hybrid adder design are also proposed as a part of the adaptive multiplier to improve the hardware performance. The proposed architecture uses a lightweight fault mitigation technique that sets the detected faulty bits to zero. The hardware resource utilization and the DNN accelerator’s reliability metrics are used to compare the proposed solution against the Triple Modular Redundancy (TMR) in multiplication, unprotected exact multiplication, and unprotected approximate multiplication. It is demonstrated that the proposed architecture enables a multiplication with a reliability level close to the multipliers protected by TMR while at the same time utilizing $2.74 \times $ less area and with 39.06% less power-delay product compared to the exact multiplier. Moreover, it has similar area, delay, and power consumption parameters compared to the state-of-the-art approximate multipliers with similar accuracy while providing fault detection and mitigation capability.
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Device and Materials Reliability
IEEE Transactions on Device and Materials Reliability 工程技术-工程:电子与电气
CiteScore
4.80
自引率
5.00%
发文量
71
审稿时长
6-12 weeks
期刊介绍: The scope of the publication includes, but is not limited to Reliability of: Devices, Materials, Processes, Interfaces, Integrated Microsystems (including MEMS & Sensors), Transistors, Technology (CMOS, BiCMOS, etc.), Integrated Circuits (IC, SSI, MSI, LSI, ULSI, ELSI, etc.), Thin Film Transistor Applications. The measurement and understanding of the reliability of such entities at each phase, from the concept stage through research and development and into manufacturing scale-up, provides the overall database on the reliability of the devices, materials, processes, package and other necessities for the successful introduction of a product to market. This reliability database is the foundation for a quality product, which meets customer expectation. A product so developed has high reliability. High quality will be achieved because product weaknesses will have been found (root cause analysis) and designed out of the final product. This process of ever increasing reliability and quality will result in a superior product. In the end, reliability and quality are not one thing; but in a sense everything, which can be or has to be done to guarantee that the product successfully performs in the field under customer conditions. Our goal is to capture these advances. An additional objective is to focus cross fertilized communication in the state of the art of reliability of electronic materials and devices and provide fundamental understanding of basic phenomena that affect reliability. In addition, the publication is a forum for interdisciplinary studies on reliability. An overall goal is to provide leading edge/state of the art information, which is critically relevant to the creation of reliable products.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信