利用gpu加速多线程程序的调度空间探索

2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) Pub Date : 2016-11-18 DOI:10.1109/MEMCOD.2016.7797754

P. Banga, Atul Pai, Subhajit Roy, Mainak Chaudhuri

{"title":"利用gpu加速多线程程序的调度空间探索","authors":"P. Banga, Atul Pai, Subhajit Roy, Mainak Chaudhuri","doi":"10.1109/MEMCOD.2016.7797754","DOIUrl":null,"url":null,"abstract":"Given an input that can trigger a concurrency bug, only a subset of possible thread schedules satisfying certain constraints can actually cause such a bug to manifest. Recent proposals on controlled randomization of thread schedules with concrete guarantees on bug detection probabilities have opened promising avenues in this direction. However, to boost the bug detection probability, these techniques typically require a significant number of schedules to be explored. As a result, it is, in general, beneficial to accelerate the schedule space exploration of the multi-threaded programs. In this paper, we introduce Simultaneous Interleaving Exploration with Controlled Sequencing (SINECOSEQ), a generic framework that leverages the high-performance graphics processing units (GPUs) to significantly accelerate schedule space navigation of general-purpose multi-threaded programs. The SINE framework accepts POSIX compliant multi-threaded programs, instruments them to intercept all shared memory accesses, and automatically generates CUDA (Compute Unified Device Architecture) compliant code that navigates the schedule space of the input multi-threaded program on an NVIDIA GPU. Each GPU thread typically explores one schedule of the input program. The COSEQ framework decides how the schedule space is navigated by architecting the schedules on the fly. While it is straightforward to construct and navigate a different schedule on each GPU thread, the performance of the resulting technique can be very poor due to disparate pieces of codes executed by each GPU thread leading to full control divergence. In this paper, we demonstrate one application of SINECOSEQ by proposing a new GPU-friendly scheduler for accelerated concurrency testing (ACT), which is inspired by the recently proposed randomized scheduler of probabilistic concurrency testing (PCT). Compared to the state-of-the-art parallel PCT (PPCT) implementation on a twelve-core CPU, our proposal implemented on an NVIDIA Kepler K20c GPU card significantly speeds up schedule space exploration for eight multi-threaded applications and kernels drawn from the Phoenix and the PARSEC suites.","PeriodicalId":180873,"journal":{"name":"2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Accelerating schedule space exploration of multi-threaded programs with GPUs\",\"authors\":\"P. Banga, Atul Pai, Subhajit Roy, Mainak Chaudhuri\",\"doi\":\"10.1109/MEMCOD.2016.7797754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given an input that can trigger a concurrency bug, only a subset of possible thread schedules satisfying certain constraints can actually cause such a bug to manifest. Recent proposals on controlled randomization of thread schedules with concrete guarantees on bug detection probabilities have opened promising avenues in this direction. However, to boost the bug detection probability, these techniques typically require a significant number of schedules to be explored. As a result, it is, in general, beneficial to accelerate the schedule space exploration of the multi-threaded programs. In this paper, we introduce Simultaneous Interleaving Exploration with Controlled Sequencing (SINECOSEQ), a generic framework that leverages the high-performance graphics processing units (GPUs) to significantly accelerate schedule space navigation of general-purpose multi-threaded programs. The SINE framework accepts POSIX compliant multi-threaded programs, instruments them to intercept all shared memory accesses, and automatically generates CUDA (Compute Unified Device Architecture) compliant code that navigates the schedule space of the input multi-threaded program on an NVIDIA GPU. Each GPU thread typically explores one schedule of the input program. The COSEQ framework decides how the schedule space is navigated by architecting the schedules on the fly. While it is straightforward to construct and navigate a different schedule on each GPU thread, the performance of the resulting technique can be very poor due to disparate pieces of codes executed by each GPU thread leading to full control divergence. In this paper, we demonstrate one application of SINECOSEQ by proposing a new GPU-friendly scheduler for accelerated concurrency testing (ACT), which is inspired by the recently proposed randomized scheduler of probabilistic concurrency testing (PCT). Compared to the state-of-the-art parallel PCT (PPCT) implementation on a twelve-core CPU, our proposal implemented on an NVIDIA Kepler K20c GPU card significantly speeds up schedule space exploration for eight multi-threaded applications and kernels drawn from the Phoenix and the PARSEC suites.\",\"PeriodicalId\":180873,\"journal\":{\"name\":\"2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MEMCOD.2016.7797754\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MEMCOD.2016.7797754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

给定一个可以触发并发错误的输入，只有满足某些约束的可能线程调度的子集才会导致这样的错误出现。最近关于线程调度的受控随机化，以及对bug检测概率的具体保证的建议，在这个方向上开辟了有希望的道路。然而，为了提高bug检测的概率，这些技术通常需要探索大量的调度。因此，总体上有利于加快多线程程序的调度空间探索。在本文中，我们介绍了同步交错探索与控制排序(SINECOSEQ)，这是一个利用高性能图形处理单元(gpu)显著加速通用多线程程序调度空间导航的通用框架。SINE框架接受POSIX兼容的多线程程序，工具它们拦截所有共享内存访问，并自动生成CUDA(计算统一设备架构)兼容的代码，导航NVIDIA GPU上输入多线程程序的调度空间。每个GPU线程通常探索输入程序的一个调度。COSEQ框架通过动态地构建调度来决定如何导航调度空间。虽然在每个GPU线程上构建和导航不同的调度很简单，但由于每个GPU线程执行的不同代码片段导致完全控制分歧，因此结果技术的性能可能非常差。在本文中，我们通过提出一种新的gpu友好的加速并发测试(ACT)调度器来演示SINECOSEQ的一个应用，该调度器受到最近提出的概率并发测试(PCT)随机调度器的启发。与在12核CPU上实现最先进的并行PCT (PPCT)相比，我们的建议在NVIDIA Kepler K20c GPU卡上实现，显着加快了来自Phoenix和PARSEC套件的八个多线程应用程序和内核的调度空间探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating schedule space exploration of multi-threaded programs with GPUs

Given an input that can trigger a concurrency bug, only a subset of possible thread schedules satisfying certain constraints can actually cause such a bug to manifest. Recent proposals on controlled randomization of thread schedules with concrete guarantees on bug detection probabilities have opened promising avenues in this direction. However, to boost the bug detection probability, these techniques typically require a significant number of schedules to be explored. As a result, it is, in general, beneficial to accelerate the schedule space exploration of the multi-threaded programs. In this paper, we introduce Simultaneous Interleaving Exploration with Controlled Sequencing (SINECOSEQ), a generic framework that leverages the high-performance graphics processing units (GPUs) to significantly accelerate schedule space navigation of general-purpose multi-threaded programs. The SINE framework accepts POSIX compliant multi-threaded programs, instruments them to intercept all shared memory accesses, and automatically generates CUDA (Compute Unified Device Architecture) compliant code that navigates the schedule space of the input multi-threaded program on an NVIDIA GPU. Each GPU thread typically explores one schedule of the input program. The COSEQ framework decides how the schedule space is navigated by architecting the schedules on the fly. While it is straightforward to construct and navigate a different schedule on each GPU thread, the performance of the resulting technique can be very poor due to disparate pieces of codes executed by each GPU thread leading to full control divergence. In this paper, we demonstrate one application of SINECOSEQ by proposing a new GPU-friendly scheduler for accelerated concurrency testing (ACT), which is inspired by the recently proposed randomized scheduler of probabilistic concurrency testing (PCT). Compared to the state-of-the-art parallel PCT (PPCT) implementation on a twelve-core CPU, our proposal implemented on an NVIDIA Kepler K20c GPU card significantly speeds up schedule space exploration for eight multi-threaded applications and kernels drawn from the Phoenix and the PARSEC suites.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)

自引率

0.00%

发文量