ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures

IEEE Open Journal of the Computer Society Pub Date : 2024-03-14 DOI:10.1109/OJCS.2024.3400696

Haoxuan Liu;Vasu Singh;Michał Filipiuk;Siva Kumar Sastry Hari

{"title":"ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures","authors":"Haoxuan Liu;Vasu Singh;Michał Filipiuk;Siva Kumar Sastry Hari","doi":"10.1109/OJCS.2024.3400696","DOIUrl":null,"url":null,"abstract":"Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. Ensuring the correct execution of these models in GPUs is critical, despite the potential for transient hardware errors. We propose a novel algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis and protection of transformer-based architectures. First, our work develops an efficient process of computing and ranking the resilience of transformers layers. Due to the large size of transformer models, applying traditional network redundancy to a subset of the most vulnerable layers provides high error coverage albeit with impractically high overhead. We address this shortcoming by providing a software-directed, checksum-based error detection technique aimed at protecting the most vulnerable general matrix multiply (GEMM) layers in the transformer models that use either floating-point or integer arithmetic. Results show that our approach achieves over 99% coverage for errors (single bit-flip fault model) that result in a mismatch with \n<inline-formula><tex-math>$< $</tex-math></inline-formula>\n0.2% and \n<inline-formula><tex-math>$< $</tex-math></inline-formula>\n0.01% computation and memory overheads, respectively. Lastly, we present the applicability of our framework in various modern GPU architectures under different numerical precisions. We introduce an efficient self-correction mechanism for resolving erroneous detection with an average of less than 2% overhead per error.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"85-96"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10530530","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10530530/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. Ensuring the correct execution of these models in GPUs is critical, despite the potential for transient hardware errors. We propose a novel algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis and protection of transformer-based architectures. First, our work develops an efficient process of computing and ranking the resilience of transformers layers. Due to the large size of transformer models, applying traditional network redundancy to a subset of the most vulnerable layers provides high error coverage albeit with impractically high overhead. We address this shortcoming by providing a software-directed, checksum-based error detection technique aimed at protecting the most vulnerable general matrix multiply (GEMM) layers in the transformer models that use either floating-point or integer arithmetic. Results show that our approach achieves over 99% coverage for errors (single bit-flip fault model) that result in a mismatch with

$< $

0.2% and

$< $

0.01% computation and memory overheads, respectively. Lastly, we present the applicability of our framework in various modern GPU architectures under different numerical precisions. We introduce an efficient self-correction mechanism for resolving erroneous detection with an average of less than 2% overhead per error.

查看原文本刊更多论文

ALBERTA：变压器架构中基于算法的抗错能力

视觉变压器越来越多地部署在要求高可靠性的安全关键应用中。确保这些模型在gpu中的正确执行是至关重要的，尽管有可能出现短暂的硬件错误。我们提出了一种新的基于算法的弹性框架，称为ALBERTA，它允许我们执行端到端的弹性分析和基于变压器的体系结构的保护。首先，我们的工作开发了一种有效的计算和排列变压器层弹性的过程。由于变压器模型的规模很大，将传统的网络冗余应用于最脆弱层的子集提供了很高的错误覆盖率，尽管开销高得不切实际。我们通过提供一种软件导向的、基于校验和的错误检测技术来解决这一缺点，该技术旨在保护使用浮点或整数算法的变压器模型中最脆弱的通用矩阵乘法（GEMM）层。结果表明，我们的方法对导致与$<不匹配的错误（单比特翻转故障模型）的覆盖率超过99%；$0.2%及$<；计算和内存开销分别为0.01%。最后，我们介绍了我们的框架在不同数值精度下的各种现代GPU架构的适用性。我们引入了一种有效的自我纠正机制来解决错误检测，平均每个错误的开销低于2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量