Highly-reliable integer matrix multiplication via numerical packing

2013 IEEE 19th International On-Line Testing Symposium (IOLTS) Pub Date : 2013-07-08 DOI:10.1109/IOLTS.2013.6604045

Ijeoma Anarado, M. A. Anam, D. Anastasia, F. Verdicchio, Y. Andreopoulos

{"title":"Highly-reliable integer matrix multiplication via numerical packing","authors":"Ijeoma Anarado, M. A. Anam, D. Anastasia, F. Verdicchio, Y. Andreopoulos","doi":"10.1109/IOLTS.2013.6604045","DOIUrl":null,"url":null,"abstract":"The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, relevance ranking and object recognition systems. Because of the prevalence of GEMM in these applications, ensuring its robustness to transient hardware faults is of paramount importance for highly-efficientlhighly-reliable systems. This is currently accomplished via error control coding (ECC) or via dual modular redundancy (DMR) approaches that produce a separate set of “parity” results to allow for fault detection in GEMM. We introduce a third family of methods for fault detection in integer matrix products based on the concept of numerical packing. The key difference of the new approach against ECC and DMR approaches is the production of redundant results within the numerical representation of the inputs rather than as a separate set of parity results. In this way, high reliability is ensured within integer matrix products while allowing for: (i) in-place storage; (ii) usage of any off-the-shelf 64-bit floating-point GEMM routine; (iii) computational overhead that is independent of the GEMM inner dimension. The only detriment against a conventional (i.e. fault-intolerant) integer matrix multiplication based on 32-bit floating-point GEMM is the sacrifice of approximately 30.6% of the bitwidth of the numerical representation. However, unlike ECC methods that can reliably detect only up to a few faults per GEMM computation (typically two), the proposed method attains more than “12 nines” reliability, i.e. it will only fail to detect 1 fault out of more than 1 trillion arbitrary faults in the GEMM operations. As such, it achieves reliability that approaches that of DMR, at a very small fraction of its cost. Specifically, a single-threaded software realization of our proposal on an Intel i7-3632QM 2.2GHz processor (Ivy Bridge architecture with AVX support) incurs, on average, only 19% increase of execution time against an optimized, fault-intolerant, 32-bit GEMM routine over a range of matrix sizes and it remains more than 80% more efficient than a DMR-based GEMM.","PeriodicalId":423175,"journal":{"name":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2013.6604045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, relevance ranking and object recognition systems. Because of the prevalence of GEMM in these applications, ensuring its robustness to transient hardware faults is of paramount importance for highly-efficientlhighly-reliable systems. This is currently accomplished via error control coding (ECC) or via dual modular redundancy (DMR) approaches that produce a separate set of “parity” results to allow for fault detection in GEMM. We introduce a third family of methods for fault detection in integer matrix products based on the concept of numerical packing. The key difference of the new approach against ECC and DMR approaches is the production of redundant results within the numerical representation of the inputs rather than as a separate set of parity results. In this way, high reliability is ensured within integer matrix products while allowing for: (i) in-place storage; (ii) usage of any off-the-shelf 64-bit floating-point GEMM routine; (iii) computational overhead that is independent of the GEMM inner dimension. The only detriment against a conventional (i.e. fault-intolerant) integer matrix multiplication based on 32-bit floating-point GEMM is the sacrifice of approximately 30.6% of the bitwidth of the numerical representation. However, unlike ECC methods that can reliably detect only up to a few faults per GEMM computation (typically two), the proposed method attains more than “12 nines” reliability, i.e. it will only fail to detect 1 fault out of more than 1 trillion arbitrary faults in the GEMM operations. As such, it achieves reliability that approaches that of DMR, at a very small fraction of its cost. Specifically, a single-threaded software realization of our proposal on an Intel i7-3632QM 2.2GHz processor (Ivy Bridge architecture with AVX support) incurs, on average, only 19% increase of execution time against an optimized, fault-intolerant, 32-bit GEMM routine over a range of matrix sizes and it remains more than 80% more efficient than a DMR-based GEMM.

查看原文本刊更多论文

高可靠的整数矩阵乘法通过数值包装

在许多信息检索、关联排序和目标识别系统中，通用矩阵乘法(GEMM)例程是计算和内存密集型的部分。由于GEMM在这些应用中的普遍存在，确保其对暂态硬件故障的鲁棒性对于高效、高可靠的系统至关重要。目前，这是通过错误控制编码(ECC)或双模块冗余(DMR)方法来实现的，这些方法可以产生一组单独的“奇偶校验”结果，从而允许在GEMM中进行故障检测。基于数值包装的概念，我们引入了第三类整数矩阵乘积故障检测方法。新方法与ECC和DMR方法的关键区别在于在输入的数值表示中产生冗余结果，而不是作为一组单独的奇偶结果。通过这种方式，保证了整数矩阵产品的高可靠性，同时允许:(i)就地存储;(ii)使用任何现成的64位浮点gem例程;(iii)独立于GEMM内部维度的计算开销。基于32位浮点GEMM的传统(即不容错)整数矩阵乘法的唯一缺点是牺牲了大约30.6%的数字表示位宽。然而，与每次GEMM计算最多只能可靠地检测到几个故障(通常为两个)的ECC方法不同，本文提出的方法具有超过“12 9”的可靠性，即在GEMM操作中，在超过1万亿个任意故障中，它只会检测到一个故障。因此，它实现了接近DMR的可靠性，而成本只是DMR的一小部分。具体来说，我们的建议在Intel i7-3632QM 2.2GHz处理器(支持AVX的Ivy Bridge架构)上的单线程软件实现，相对于优化的、容错的32位GEMM例程，在各种矩阵大小的范围内，平均只增加19%的执行时间，并且比基于dmr的GEMM效率高出80%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 19th International On-Line Testing Symposium (IOLTS)

自引率

0.00%

发文量