FAT: Training Neural Networks for Reliable Inference Under Hardware Faults

2020 IEEE International Test Conference (ITC) Pub Date : 2020-11-01 DOI:10.1109/ITC44778.2020.9325249

Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott, K. Vissers

{"title":"FAT: Training Neural Networks for Reliable Inference Under Hardware Faults","authors":"Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott, K. Vissers","doi":"10.1109/ITC44778.2020.9325249","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy, but also since they have significantly lower compute and memory requirements compared to their floating point equivalents. QNN deployment is also being evaluated for safety-critical applications, such as automotive, avionics, medical or industrial. These systems require functional safety, guaranteeing failure-free behaviour even in the presence of hardware faults. In general fault tolerance can be achieved by adding redundancy to the system, which further exacerbates the overall computational demands and makes it difficult to meet the power and performance requirements. In order to decrease the hardware cost for achieving functional safety, it is vital to explore domain-specific solutions which can exploit the inherent features of DNNs. In this work we present a novel methodology called fault-aware training (FAT), which includes error modeling during neural network (NN) training, to make QNNs resilient to specific fault models on the device. Our experiments show that by injecting faults in the convolutional layers during training, highly accurate convolutional neural networks (CNNs) can be trained which exhibits much better error tolerance compared to the original. Furthermore, we show that redundant systems which are built from QNNs trained with FAT achieve higher worse-case accuracy at lower hardware cost. This has been validated for numerous classification tasks including CIFAR10, GTSRB, SVHN and ImageNet.","PeriodicalId":251504,"journal":{"name":"2020 IEEE International Test Conference (ITC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Test Conference (ITC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC44778.2020.9325249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy, but also since they have significantly lower compute and memory requirements compared to their floating point equivalents. QNN deployment is also being evaluated for safety-critical applications, such as automotive, avionics, medical or industrial. These systems require functional safety, guaranteeing failure-free behaviour even in the presence of hardware faults. In general fault tolerance can be achieved by adding redundancy to the system, which further exacerbates the overall computational demands and makes it difficult to meet the power and performance requirements. In order to decrease the hardware cost for achieving functional safety, it is vital to explore domain-specific solutions which can exploit the inherent features of DNNs. In this work we present a novel methodology called fault-aware training (FAT), which includes error modeling during neural network (NN) training, to make QNNs resilient to specific fault models on the device. Our experiments show that by injecting faults in the convolutional layers during training, highly accurate convolutional neural networks (CNNs) can be trained which exhibits much better error tolerance compared to the original. Furthermore, we show that redundant systems which are built from QNNs trained with FAT achieve higher worse-case accuracy at lower hardware cost. This has been validated for numerous classification tasks including CIFAR10, GTSRB, SVHN and ImageNet.

查看原文本刊更多论文

硬件故障下训练神经网络的可靠推理

深度神经网络(dnn)是最先进的算法，适用于从图像分类到语音识别的多种应用。虽然提供了出色的准确性，但它们通常具有巨大的计算和内存需求。因此，量化神经网络(qnn)越来越多地被采用和部署，特别是在嵌入式设备上，这要归功于它们的高精度，但也因为它们与浮点等效物相比具有显着降低的计算和内存需求。QNN的部署也正在评估安全关键应用，如汽车、航空电子、医疗或工业。这些系统要求功能安全，即使在存在硬件故障的情况下也能保证无故障行为。一般来说，可以通过增加系统冗余来实现容错，这进一步加剧了总体计算需求，使其难以满足功率和性能要求。为了降低实现功能安全的硬件成本，探索能够利用深度神经网络固有特征的特定领域解决方案至关重要。在这项工作中，我们提出了一种称为故障感知训练(FAT)的新方法，其中包括神经网络(NN)训练期间的错误建模，以使qnn对设备上的特定故障模型具有弹性。我们的实验表明，通过在训练过程中在卷积层中注入错误，可以训练出高精度的卷积神经网络(cnn)，并且与原始卷积神经网络相比具有更好的容错性。此外，我们证明了由FAT训练的qnn构建的冗余系统以较低的硬件成本实现了更高的最差情况精度。这已经在许多分类任务中得到了验证，包括CIFAR10、GTSRB、SVHN和ImageNet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Test Conference (ITC)

自引率

0.00%

发文量