harDNNing: a machine-learning-based framework for fault tolerance assessment and protection of DNNs

2023 IEEE European Test Symposium (ETS) Pub Date : 2023-05-22 DOI:10.1109/ETS56758.2023.10174178

Marcello Traiola, A. Kritikakou, O. Sentieys

{"title":"harDNNing: a machine-learning-based framework for fault tolerance assessment and protection of DNNs","authors":"Marcello Traiola, A. Kritikakou, O. Sentieys","doi":"10.1109/ETS56758.2023.10174178","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) show promising performance in several application domains, such as robotics, aerospace, smart healthcare, and autonomous driving. Nevertheless, DNN results may be incorrect, not only because of the network intrinsic inaccuracy, but also due to faults affecting the hardware. Indeed, hardware faults may impact the DNN inference process and lead to prediction failures. Therefore, ensuring the fault tolerance of DNN is crucial. However, common fault tolerance approaches are not cost-effective for DNNs protection, because of the prohibitive overheads due to the large size of DNNs and of the required memory for parameter storage. In this work, we propose a comprehensive framework to assess the fault tolerance of DNNs and cost-effectively protect them. As a first step, the proposed framework performs data-type-and-layer-based fault injection, driven by the DNN characteristics. As a second step, it uses classification-based machine learning methods in order to predict the criticality, not only of network parameters, but also of their bits. Last, dedicated Error Correction Codes (ECCs) are selectively inserted to protect the critical parameters and bits, hence protecting the DNNs with low cost. Thanks to the proposed framework, we explored and protected two Convolutional Neural Networks (CNNs), each with four different data encoding. The results show that it is possible to protect the critical network parameters with selective ECCs while saving up to 83% memory w.r.t. conventional ECC approaches.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE European Test Symposium (ETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETS56758.2023.10174178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Networks (DNNs) show promising performance in several application domains, such as robotics, aerospace, smart healthcare, and autonomous driving. Nevertheless, DNN results may be incorrect, not only because of the network intrinsic inaccuracy, but also due to faults affecting the hardware. Indeed, hardware faults may impact the DNN inference process and lead to prediction failures. Therefore, ensuring the fault tolerance of DNN is crucial. However, common fault tolerance approaches are not cost-effective for DNNs protection, because of the prohibitive overheads due to the large size of DNNs and of the required memory for parameter storage. In this work, we propose a comprehensive framework to assess the fault tolerance of DNNs and cost-effectively protect them. As a first step, the proposed framework performs data-type-and-layer-based fault injection, driven by the DNN characteristics. As a second step, it uses classification-based machine learning methods in order to predict the criticality, not only of network parameters, but also of their bits. Last, dedicated Error Correction Codes (ECCs) are selectively inserted to protect the critical parameters and bits, hence protecting the DNNs with low cost. Thanks to the proposed framework, we explored and protected two Convolutional Neural Networks (CNNs), each with four different data encoding. The results show that it is possible to protect the critical network parameters with selective ECCs while saving up to 83% memory w.r.t. conventional ECC approaches.

查看原文本刊更多论文

harDNNing:基于机器学习的深度神经网络容错评估和保护框架

深度神经网络(dnn)在机器人、航空航天、智能医疗和自动驾驶等多个应用领域显示出良好的性能。然而，DNN的结果可能是不正确的，这不仅是因为网络本身的不准确，还因为影响硬件的故障。实际上，硬件故障可能会影响DNN推理过程并导致预测失败。因此，保证深度神经网络的容错性至关重要。然而，常见的容错方法对于dnn保护并不具有成本效益，因为dnn的大尺寸和参数存储所需的内存造成了令人望而却步的开销。在这项工作中，我们提出了一个全面的框架来评估深度神经网络的容错性并经济有效地保护它们。作为第一步，提出的框架执行基于数据类型和层的故障注入，由深度神经网络特征驱动。作为第二步，它使用基于分类的机器学习方法来预测临界性，不仅是网络参数，还有它们的比特。最后，选择性地插入专用纠错码(ECCs)来保护关键参数和位，从而以低成本保护dnn。由于提出的框架，我们探索并保护了两个卷积神经网络(cnn)，每个卷积神经网络都有四种不同的数据编码。结果表明，与传统的ECC方法相比，选择性ECC可以保护关键的网络参数，同时节省高达83%的内存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE European Test Symposium (ETS)

自引率

0.00%

发文量