Towards a SYCL API for Approximate Computing

Lorenzo Carpentieri, Biagio Cosenza
{"title":"Towards a SYCL API for Approximate Computing","authors":"Lorenzo Carpentieri, Biagio Cosenza","doi":"10.1145/3585341.3585374","DOIUrl":null,"url":null,"abstract":"Approximate computing is a well-known method [7] to achieve higher performance or lower energy consumption while accepting a loss of output accuracy. Many applications such as image processing and neural networks, are tolerant of a certain amount of error, and have the potential for significant improvements in terms of execution time and energy consumption. The most advanced software approximation techniques are mixed precision, which uses a lower precision data representation for both integer and floating point variables [1, 4]; perforation, which skips instruction blocks in a program, iterations in a loop, or data in buffers assuming that nearby data have similar values [2, 5, 6, 8]; and relaxed synchronization which removes synchronization points that represent one of the major bottleneck in parallel applications [3, 9]. These approximate approaches differ in performance achieved and also in error produced. Usually, perforation and synchronization elision have higher performance compared with mixed precision but produce more errors. In particular, synchronization elision introduces non-deterministic errors that are complex to handle. Support for approximate computing is provided by the SYCL heterogeneous programming model often used for developing portable HPC applications. SYCL supports approximate computing by providing a set of built-in functions and data types that can be used to perform approximate operations, such as half-floating-point reductions and bit-level operations. In this technical talk, we present SYprox, a SYCL-based API supporting a broad set of approximation techniques in modern C++. SYprox introduces a set of semantics that extend SYCL’s buffers and accessors to provide a high-level easy-to-use programming API. It supports data perforation and elision patterns for efficient approximation, as well as signal reconstruction algorithms for error mitigation. Figure 1 (a) depicts the accurate execution of an application while Figure 1 (b) shows the approximation process: an input buffer is perforated according to the chosen schema, and the perforated data can be approximated before or after computation using respectively input or output reconstruction. The code snippet below illustrates the accurate version of a SYCL program and our proposed approximate approach using SYprox: Figure 2 shows a visual representation of the schemes on 1D and 2D buffers. Gray components are perforated, whereas blue-colored elements are computed. Schemes (a) and (b) can be applied to 2D buffers and respectively calculate a row and column of results. Also, scheme (c) is applicable to 2D buffers and perforates data following a checkerboard layout. Finally, schema (d) works on 1D buffers and perforates data according to a user-defined skip factor. As applying perforation strategies introduce errors in the final output, the developed library also provides two types of reconstruction techniques to mitigate applications error: output and input reconstruction. Output reconstruction approximates perforated data with an interpolation of the output. Differently, input reconstruction approximates perforated data before computation. In this case, the selected perforation schema defines which data will not be loaded in local memory, while the skipped data will be approximated directly in local memory using interpolation. This approach mixes local memory optimization with perforation, decreasing the number of global memory accesses that represent a bottleneck in GPUs application. Loading data in local memory requires a synchronization point to ensure that all threads in a block have the same view of the local memory. To decrease the time lost during synchronization, SYprox provides a synchronization elision mechanism that defines a way to handle the number of synchronization points. Both input and output reconstructions are based on data interpolation. Figure 3 shows the data reconstruction using three different types of interpolation. For basic interpolation (b) it is necessary that elements to be reconstructed have adjacent elements on both sides. In stencil interpolation (c) we need adjacent elements on all four direction (top, down, left, right). When this requirement is not respected we employ nearest-neighbor interpolation (a) which approximates data with the nearest element. Since the effectiveness of the reconstruction techniques depends on the perforation strategy adopted and the input data distribution, SYprox also provides a simple way to implement an ad-hoc perforation strategy that best fits the characteristics of the given input. In this talk, we show a preliminary performance and error evaluation comparing the base implementation of 3 applications with the approximated version. Performance-wise, all applications have a speedup higher than 2x compared to the accurate version. On the other hand, results show that the error introduced by the approximation is highly dependent on how the perforation strategy and reconstruction technique are combined. Despite this, there is an error of less than 10% for all applications.","PeriodicalId":360830,"journal":{"name":"Proceedings of the 2023 International Workshop on OpenCL","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3585341.3585374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Approximate computing is a well-known method [7] to achieve higher performance or lower energy consumption while accepting a loss of output accuracy. Many applications such as image processing and neural networks, are tolerant of a certain amount of error, and have the potential for significant improvements in terms of execution time and energy consumption. The most advanced software approximation techniques are mixed precision, which uses a lower precision data representation for both integer and floating point variables [1, 4]; perforation, which skips instruction blocks in a program, iterations in a loop, or data in buffers assuming that nearby data have similar values [2, 5, 6, 8]; and relaxed synchronization which removes synchronization points that represent one of the major bottleneck in parallel applications [3, 9]. These approximate approaches differ in performance achieved and also in error produced. Usually, perforation and synchronization elision have higher performance compared with mixed precision but produce more errors. In particular, synchronization elision introduces non-deterministic errors that are complex to handle. Support for approximate computing is provided by the SYCL heterogeneous programming model often used for developing portable HPC applications. SYCL supports approximate computing by providing a set of built-in functions and data types that can be used to perform approximate operations, such as half-floating-point reductions and bit-level operations. In this technical talk, we present SYprox, a SYCL-based API supporting a broad set of approximation techniques in modern C++. SYprox introduces a set of semantics that extend SYCL’s buffers and accessors to provide a high-level easy-to-use programming API. It supports data perforation and elision patterns for efficient approximation, as well as signal reconstruction algorithms for error mitigation. Figure 1 (a) depicts the accurate execution of an application while Figure 1 (b) shows the approximation process: an input buffer is perforated according to the chosen schema, and the perforated data can be approximated before or after computation using respectively input or output reconstruction. The code snippet below illustrates the accurate version of a SYCL program and our proposed approximate approach using SYprox: Figure 2 shows a visual representation of the schemes on 1D and 2D buffers. Gray components are perforated, whereas blue-colored elements are computed. Schemes (a) and (b) can be applied to 2D buffers and respectively calculate a row and column of results. Also, scheme (c) is applicable to 2D buffers and perforates data following a checkerboard layout. Finally, schema (d) works on 1D buffers and perforates data according to a user-defined skip factor. As applying perforation strategies introduce errors in the final output, the developed library also provides two types of reconstruction techniques to mitigate applications error: output and input reconstruction. Output reconstruction approximates perforated data with an interpolation of the output. Differently, input reconstruction approximates perforated data before computation. In this case, the selected perforation schema defines which data will not be loaded in local memory, while the skipped data will be approximated directly in local memory using interpolation. This approach mixes local memory optimization with perforation, decreasing the number of global memory accesses that represent a bottleneck in GPUs application. Loading data in local memory requires a synchronization point to ensure that all threads in a block have the same view of the local memory. To decrease the time lost during synchronization, SYprox provides a synchronization elision mechanism that defines a way to handle the number of synchronization points. Both input and output reconstructions are based on data interpolation. Figure 3 shows the data reconstruction using three different types of interpolation. For basic interpolation (b) it is necessary that elements to be reconstructed have adjacent elements on both sides. In stencil interpolation (c) we need adjacent elements on all four direction (top, down, left, right). When this requirement is not respected we employ nearest-neighbor interpolation (a) which approximates data with the nearest element. Since the effectiveness of the reconstruction techniques depends on the perforation strategy adopted and the input data distribution, SYprox also provides a simple way to implement an ad-hoc perforation strategy that best fits the characteristics of the given input. In this talk, we show a preliminary performance and error evaluation comparing the base implementation of 3 applications with the approximated version. Performance-wise, all applications have a speedup higher than 2x compared to the accurate version. On the other hand, results show that the error introduced by the approximation is highly dependent on how the perforation strategy and reconstruction technique are combined. Despite this, there is an error of less than 10% for all applications.
面向近似计算的SYCL API
另一方面,结果表明,由近似引入的误差高度依赖于如何结合射孔策略和重建技术。尽管如此,所有应用程序的误差都小于10%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信