AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators

IF 2.3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Device and Materials Reliability Pub Date : 2024-12-27 DOI:10.1109/TDMR.2024.3523386

Mahdi Taheri;Natalia Cherezova;Samira Nazari;Ali Azarpeyvand;Tara Ghasempouri;Masoud Daneshtalab;Jaan Raik;Maksim Jenihhin

{"title":"AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators","authors":"Mahdi Taheri;Natalia Cherezova;Samira Nazari;Ali Azarpeyvand;Tara Ghasempouri;Masoud Daneshtalab;Jaan Raik;Maksim Jenihhin","doi":"10.1109/TDMR.2024.3523386","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN) hardware accelerators are essential in a spectrum of safety-critical edge-AI applications with stringent reliability, energy efficiency, and latency requirements. Multiplication is the most resource-hungry operation in the neural network’s processing elements. This paper proposes a scalable adaptive fault-tolerant approximate multiplier (AdAM) tailored for ASIC-based DNN accelerators at the algorithm and circuit levels. AdAM employs an adaptive adder that relies on an unconventional use of input Leading One Detector (LOD) values for fault detection by optimizing unutilized adder resources. A gate-level optimized LOD design and a hybrid adder design are also proposed as a part of the adaptive multiplier to improve the hardware performance. The proposed architecture uses a lightweight fault mitigation technique that sets the detected faulty bits to zero. The hardware resource utilization and the DNN accelerator’s reliability metrics are used to compare the proposed solution against the Triple Modular Redundancy (TMR) in multiplication, unprotected exact multiplication, and unprotected approximate multiplication. It is demonstrated that the proposed architecture enables a multiplication with a reliability level close to the multipliers protected by TMR while at the same time utilizing <inline-formula> <tex-math>$2.74 \\times $ </tex-math></inline-formula> less area and with 39.06% less power-delay product compared to the exact multiplier. Moreover, it has similar area, delay, and power consumption parameters compared to the state-of-the-art approximate multipliers with similar accuracy while providing fault detection and mitigation capability.","PeriodicalId":448,"journal":{"name":"IEEE Transactions on Device and Materials Reliability","volume":"25 1","pages":"66-75"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Device and Materials Reliability","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10816697/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Network (DNN) hardware accelerators are essential in a spectrum of safety-critical edge-AI applications with stringent reliability, energy efficiency, and latency requirements. Multiplication is the most resource-hungry operation in the neural network’s processing elements. This paper proposes a scalable adaptive fault-tolerant approximate multiplier (AdAM) tailored for ASIC-based DNN accelerators at the algorithm and circuit levels. AdAM employs an adaptive adder that relies on an unconventional use of input Leading One Detector (LOD) values for fault detection by optimizing unutilized adder resources. A gate-level optimized LOD design and a hybrid adder design are also proposed as a part of the adaptive multiplier to improve the hardware performance. The proposed architecture uses a lightweight fault mitigation technique that sets the detected faulty bits to zero. The hardware resource utilization and the DNN accelerator’s reliability metrics are used to compare the proposed solution against the Triple Modular Redundancy (TMR) in multiplication, unprotected exact multiplication, and unprotected approximate multiplication. It is demonstrated that the proposed architecture enables a multiplication with a reliability level close to the multipliers protected by TMR while at the same time utilizing

$2.74 \times $

less area and with 39.06% less power-delay product compared to the exact multiplier. Moreover, it has similar area, delay, and power consumption parameters compared to the state-of-the-art approximate multipliers with similar accuracy while providing fault detection and mitigation capability.

查看原文本刊更多论文

深度神经网络加速器容错的自适应近似乘法器

深度神经网络（DNN）硬件加速器在一系列具有严格可靠性、能效和延迟要求的安全关键型边缘人工智能应用中至关重要。乘法运算是神经网络处理元素中最耗费资源的运算。本文提出了一种可扩展的自适应容错近似乘法器（AdAM），在算法和电路层面上为基于asic的深度神经网络加速器量身定制。AdAM采用自适应加法器，通过优化未利用的加法器资源，非常规地使用输入先导检测器（LOD）值进行故障检测。为了提高自适应乘法器的硬件性能，还提出了门级优化LOD设计和混合加法器设计。提出的体系结构使用轻量级故障缓解技术，将检测到的故障位设置为零。利用硬件资源利用率和DNN加速器的可靠性指标，将提出的解决方案与三模冗余（TMR）的乘法、无保护的精确乘法和无保护的近似乘法进行了比较。结果表明，所提出的结构使乘法具有接近TMR保护的乘法器的可靠性水平，同时使用的面积比精确乘法器少2.74倍，功耗延迟积比精确乘法器少39.06%。此外，与最先进的近似乘法器相比，它具有相似的面积、延迟和功耗参数，具有相似的精度，同时提供故障检测和缓解能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Device and Materials Reliability 工程技术-工程：电子与电气

CiteScore

4.80

自引率

5.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The scope of the publication includes, but is not limited to Reliability of: Devices, Materials, Processes, Interfaces, Integrated Microsystems (including MEMS & Sensors), Transistors, Technology (CMOS, BiCMOS, etc.), Integrated Circuits (IC, SSI, MSI, LSI, ULSI, ELSI, etc.), Thin Film Transistor Applications. The measurement and understanding of the reliability of such entities at each phase, from the concept stage through research and development and into manufacturing scale-up, provides the overall database on the reliability of the devices, materials, processes, package and other necessities for the successful introduction of a product to market. This reliability database is the foundation for a quality product, which meets customer expectation. A product so developed has high reliability. High quality will be achieved because product weaknesses will have been found (root cause analysis) and designed out of the final product. This process of ever increasing reliability and quality will result in a superior product. In the end, reliability and quality are not one thing; but in a sense everything, which can be or has to be done to guarantee that the product successfully performs in the field under customer conditions. Our goal is to capture these advances. An additional objective is to focus cross fertilized communication in the state of the art of reliability of electronic materials and devices and provide fundamental understanding of basic phenomena that affect reliability. In addition, the publication is a forum for interdisciplinary studies on reliability. An overall goal is to provide leading edge/state of the art information, which is critically relevant to the creation of reliable products.