Assessing the Impact of Compiler Optimizations on GPUs Reliability

IF 1.8 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Architecture and Code Optimization Pub Date : 2024-01-12 DOI:10.1145/3638249

Fernando Fernandes dos Santos, Luigi Carro, Flavio Vella, Paolo Rech

{"title":"Assessing the Impact of Compiler Optimizations on GPUs Reliability","authors":"Fernando Fernandes dos Santos, Luigi Carro, Flavio Vella, Paolo Rech","doi":"10.1145/3638249","DOIUrl":null,"url":null,"abstract":"<p>Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex optimizations to improve performance. These optimizations modify how the software is mapped in the underlying hardware; thus, as we show in this paper, they can also affect GPU reliability. We evaluate the effects on the GPU error rate of the optimization flags applied at the NVCC Parallel Thread Execution (PTX) compiling phase by analyzing two NVIDIA GPU architectures (Kepler and Volta) and two compiler versions (NVCC 10.2 and 11.3). We compare and combine fault propagation analysis based on software fault injection, hardware utilization distribution obtained with application-level profiling, and machine instructions radiation-induced error rate measured with beam experiments. We consider eight different workloads and 144 combinations of compilation flags, and we show that optimizations can impact the GPUs’ error rate of up to an order of magnitude. Additionally, through accelerated neutron beam experiments on a NVIDIA Kepler GPU, we show that the error rate of the unoptimized GEMM (-O0 flag) is lower than the optimized GEMM’s (-O3 flag) error rate. When the performance is evaluated together with the error rate, we show that the most optimized versions (-O1 and -O3) always produce a higher amount of correct data than the unoptimized code (-O0).</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"42 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3638249","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex optimizations to improve performance. These optimizations modify how the software is mapped in the underlying hardware; thus, as we show in this paper, they can also affect GPU reliability. We evaluate the effects on the GPU error rate of the optimization flags applied at the NVCC Parallel Thread Execution (PTX) compiling phase by analyzing two NVIDIA GPU architectures (Kepler and Volta) and two compiler versions (NVCC 10.2 and 11.3). We compare and combine fault propagation analysis based on software fault injection, hardware utilization distribution obtained with application-level profiling, and machine instructions radiation-induced error rate measured with beam experiments. We consider eight different workloads and 144 combinations of compilation flags, and we show that optimizations can impact the GPUs’ error rate of up to an order of magnitude. Additionally, through accelerated neutron beam experiments on a NVIDIA Kepler GPU, we show that the error rate of the unoptimized GEMM (-O0 flag) is lower than the optimized GEMM’s (-O3 flag) error rate. When the performance is evaluated together with the error rate, we show that the most optimized versions (-O1 and -O3) always produce a higher amount of correct data than the unoptimized code (-O0).

查看原文本刊更多论文

评估编译器优化对 GPU 可靠性的影响

图形处理器（GPU）编译器不断发展，以支持多种架构的通用编程语言。英伟达™（NVIDIA®）CUDA 编译器（NVCC）在生成机器代码之前有许多编译级别，并应用复杂的优化来提高性能。这些优化修改了软件在底层硬件中的映射方式；因此，正如我们在本文中所展示的，它们也会影响 GPU 的可靠性。我们通过分析两种英伟达™（NVIDIA®）GPU架构（Kepler和Volta）和两个编译器版本（NVCC 10.2和11.3），评估了在NVCC并行线程执行（PTX）编译阶段应用的优化标志对GPU错误率的影响。我们比较并结合了基于软件故障注入的故障传播分析、通过应用级剖析获得的硬件利用率分布以及通过光束实验测量的机器指令辐射诱发错误率。我们考虑了八种不同的工作负载和 144 种编译标志组合，结果表明，优化可对 GPU 的错误率产生高达一个数量级的影响。此外，通过在英伟达开普勒GPU上进行加速中子束实验，我们发现未优化GEMM（-O0标志）的错误率低于优化GEMM（-O3标志）的错误率。当性能与错误率一起评估时，我们发现最优化的版本（-O1 和 -O3）产生的正确数据量总是高于未优化的代码（-O0）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.