基于神经孪生体的人工智能加速器故障临界性分析

2021 IEEE International Test Conference (ITC) Pub Date : 2021-10-01 DOI:10.1109/ITC50571.2021.00015

Arjun Chaudhuri, Ching-Yuan Chen, Jonti Talukdar, Siddarth Madala, A. K. Dubey, K. Chakrabarty

{"title":"基于神经孪生体的人工智能加速器故障临界性分析","authors":"Arjun Chaudhuri, Ching-Yuan Chen, Jonti Talukdar, Siddarth Madala, A. K. Dubey, K. Chakrabarty","doi":"10.1109/ITC50571.2021.00015","DOIUrl":null,"url":null,"abstract":"Owing to the inherent fault tolerance of deep neural network (DNN) models used for classification, many structural faults in the processing elements (PEs) of a systolic array-based AI accelerator are functionally benign. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of PEs on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations. We present a framework for analyzing fault criticality with a negligible amount of ground-truth data. We incorporate the gate-level structural and functional information of the PEs in their \"neural twins\", referred to as \"PE-Nets\". The PE netlist is translated into a trainable PE-Net, where the standard-cell instances are substituted by their corresponding \"Cell-Nets\" and the wires translate to neural connections. Each Cell-Net is a pre-trained DNN that models the Boolean-logic behavior of the corresponding standard cell. In the PE-Net, every neural connection is associated with a bias that represents a perturbation in the signal propagated by that connection. We utilize a recently proposed misclassification-driven training algorithm to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.","PeriodicalId":147006,"journal":{"name":"2021 IEEE International Test Conference (ITC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Efficient Fault-Criticality Analysis for AI Accelerators using a Neural Twin∗\",\"authors\":\"Arjun Chaudhuri, Ching-Yuan Chen, Jonti Talukdar, Siddarth Madala, A. K. Dubey, K. Chakrabarty\",\"doi\":\"10.1109/ITC50571.2021.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Owing to the inherent fault tolerance of deep neural network (DNN) models used for classification, many structural faults in the processing elements (PEs) of a systolic array-based AI accelerator are functionally benign. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of PEs on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations. We present a framework for analyzing fault criticality with a negligible amount of ground-truth data. We incorporate the gate-level structural and functional information of the PEs in their \\\"neural twins\\\", referred to as \\\"PE-Nets\\\". The PE netlist is translated into a trainable PE-Net, where the standard-cell instances are substituted by their corresponding \\\"Cell-Nets\\\" and the wires translate to neural connections. Each Cell-Net is a pre-trained DNN that models the Boolean-logic behavior of the corresponding standard cell. In the PE-Net, every neural connection is associated with a bias that represents a perturbation in the signal propagated by that connection. We utilize a recently proposed misclassification-driven training algorithm to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.\",\"PeriodicalId\":147006,\"journal\":{\"name\":\"2021 IEEE International Test Conference (ITC)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Test Conference (ITC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITC50571.2021.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Test Conference (ITC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC50571.2021.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

由于用于分类的深度神经网络(DNN)模型固有的容错性，基于收缩阵列的人工智能加速器的处理元件(pe)中的许多结构故障在功能上是良性的。由于加速器阵列中有许多潜在的故障点，并且pe的临界特性依赖于功能输入数据，因此用于确定故障临界性的暴力故障模拟计算成本很高。有监督学习技术可用于准确估计故障临界性，但它需要模型训练的真实情况。地面真值的收集涉及广泛的和计算昂贵的故障模拟。我们提出了一个框架，分析故障临界与可忽略不计的地面真值数据。我们将pe的门级结构和功能信息整合到它们的“神经双胞胎”(简称“PE-Nets”)中。PE网表被转换成可训练的PE网，其中标准细胞实例被相应的“细胞网”所取代，电线被转换成神经连接。每个cell - net都是一个预训练的DNN，可以模拟相应标准cell的布尔逻辑行为。在PE-Net中，每个神经连接都与一个偏差相关联，该偏差表示该连接传播的信号中的扰动。我们利用最近提出的错误分类驱动的训练算法来敏感和识别对给定应用程序工作负载的加速器功能至关重要的偏差。该框架仅利用16位和32位PE中总故障的2%的临界性知识，就能在16位和32位PE中实现100%的故障临界性分类准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Fault-Criticality Analysis for AI Accelerators using a Neural Twin∗

Owing to the inherent fault tolerance of deep neural network (DNN) models used for classification, many structural faults in the processing elements (PEs) of a systolic array-based AI accelerator are functionally benign. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of PEs on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations. We present a framework for analyzing fault criticality with a negligible amount of ground-truth data. We incorporate the gate-level structural and functional information of the PEs in their "neural twins", referred to as "PE-Nets". The PE netlist is translated into a trainable PE-Net, where the standard-cell instances are substituted by their corresponding "Cell-Nets" and the wires translate to neural connections. Each Cell-Net is a pre-trained DNN that models the Boolean-logic behavior of the corresponding standard cell. In the PE-Net, every neural connection is associated with a bias that represents a perturbation in the signal propagated by that connection. We utilize a recently proposed misclassification-driven training algorithm to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Test Conference (ITC)

自引率

0.00%

发文量