A sound and complete abstraction for reasoning about parallel prefix sums

Nathan Chong, A. Donaldson, J. Ketema
{"title":"A sound and complete abstraction for reasoning about parallel prefix sums","authors":"Nathan Chong, A. Donaldson, J. Ketema","doi":"10.1145/2535838.2535882","DOIUrl":null,"url":null,"abstract":"Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \\lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.","PeriodicalId":20683,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2535838.2535882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.
关于并行前缀和推理的一个健全和完整的抽象
前缀和是实现许多并发软件应用程序的关键构建块,最近有很多工作是为了有效地实现在大规模并行图形处理单元(gpu)上运行的前缀和。由于前缀和位于许多gpu加速应用程序的核心,因此前缀和实现的正确性至关重要。我们引入了一个新的抽象,求和区间,它允许对前缀和的实现进行可伸缩的推理。我们将这个抽象表示为一个单群,并证明了一个稳健性和完备性结果,该结果表明对于长度为$n$的数组,一般顺序前缀和实现是正确的,当且仅当它以求和单群区间实例化时计算特定测试用例的正确结果。这允许通过运行单个测试来建立正确性,其中输入和结果需要O(n lg(n))个空间。这改进了Sheeran的现有结果,其中输入需要O(n lg(n))空间,结果需要O(n2 \lg(n))空间,并且对于大n来说比Voigtlaender使用O(n)空间用于输入和结果但需要运行O(n2)个测试的方法更可行。然后,我们将我们的抽象和结果扩展到数据并行程序的上下文中,开发了一种用于GPU实现前缀和的自动验证方法。我们的方法使用静态验证来证明通用前缀和实现是无数据竞争的,之后可以通过在求和抽象的区间内运行单个测试用例来确定实现的功能正确性。我们使用四种不同的前缀和算法进行了实验评估,结果表明我们的方法高度自动化,适用于大线程数,并且在应用于大型数组时明显优于Voigtlaender的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信