Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI:10.1109/HiPC.2013.6799110

Yu Su, Ding Ye, Jingling Xue

{"title":"Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems","authors":"Yu Su, Ding Ye, Jingling Xue","doi":"10.1109/HiPC.2013.6799110","DOIUrl":null,"url":null,"abstract":"This paper describes the first implementation of Andersen's inclusion-based pointer analysis for C programs on a heterogeneous CPU-GPU system, where both its CPU and GPU cores are used. As an important graph algorithm, Andersen's analysis is difficult to parallelise because it makes extensive modifications to the structure of the underlying graph, in a way that is highly input-dependent and statically hard to analyse. Existing parallel solutions run on either the CPU or GPU but not both, rendering the underlying computational resources underutilised and the ratios of CPU-only over GPU-only speedups for certain programs (i.e., graphs) unpredictable. We observe that a naive parallel solution of Andersen's analysis on a CPU-GPU system suffers from poor performance due to workload imbalance. We introduce a solution that is centered around a new dynamic workload distribution scheme. The novelty lies in prioritising the distribution of different types of workloads, i.e., graph-rewriting rules in Andersen's analysis to CPU or GPU according to the degrees of the processing unit's suitability for processing them. This scheme is effective when combined with synchronisation-free execution of tasks (i.e., graph-rewriting rules) and difference propagation of points-to information between the CPU and GPU. For a set of seven C benchmarks evaluated, our CPU-GPU solution outperforms (on average) (1) the CPU-only solution by 50.6%, (2) the GPU-only solution by 78.5%, and (3) an oracle solution that behaves as the faster of (1) and (2) on every benchmark by 34.6%.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"20th Annual International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2013.6799110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

This paper describes the first implementation of Andersen's inclusion-based pointer analysis for C programs on a heterogeneous CPU-GPU system, where both its CPU and GPU cores are used. As an important graph algorithm, Andersen's analysis is difficult to parallelise because it makes extensive modifications to the structure of the underlying graph, in a way that is highly input-dependent and statically hard to analyse. Existing parallel solutions run on either the CPU or GPU but not both, rendering the underlying computational resources underutilised and the ratios of CPU-only over GPU-only speedups for certain programs (i.e., graphs) unpredictable. We observe that a naive parallel solution of Andersen's analysis on a CPU-GPU system suffers from poor performance due to workload imbalance. We introduce a solution that is centered around a new dynamic workload distribution scheme. The novelty lies in prioritising the distribution of different types of workloads, i.e., graph-rewriting rules in Andersen's analysis to CPU or GPU according to the degrees of the processing unit's suitability for processing them. This scheme is effective when combined with synchronisation-free execution of tasks (i.e., graph-rewriting rules) and difference propagation of points-to information between the CPU and GPU. For a set of seven C benchmarks evaluated, our CPU-GPU solution outperforms (on average) (1) the CPU-only solution by 50.6%, (2) the GPU-only solution by 78.5%, and (3) an oracle solution that behaves as the faster of (1) and (2) on every benchmark by 34.6%.

查看原文本刊更多论文

在异构CPU-GPU系统上加速基于包容的指针分析

本文描述了Andersen基于包含的指针分析在异构CPU-GPU系统上的C程序的第一个实现，其中同时使用了其CPU和GPU内核。作为一种重要的图算法，Andersen的分析很难并行化，因为它对底层图的结构进行了广泛的修改，以一种高度依赖输入和静态难以分析的方式。现有的并行解决方案要么在CPU上运行，要么在GPU上运行，但不能同时在两者上运行，这使得底层计算资源利用率不足，而且对于某些程序(即图形)，CPU与GPU的加速比率不可预测。我们观察到，Andersen的分析在CPU-GPU系统上的朴素并行解决方案由于工作负载不平衡而导致性能不佳。我们将介绍一个以新的动态工作负载分配方案为中心的解决方案。其新颖之处在于，根据处理单元适合处理的程度，将不同类型的工作负载，即Andersen分析中的图形重写规则优先分配给CPU或GPU。当与任务的无同步执行(即，图形重写规则)和CPU和GPU之间点到信息的差异传播相结合时，该方案是有效的。对于一组评估的七个C基准测试，我们的CPU-GPU解决方案(平均)优于(1)纯cpu解决方案50.6%，(2)纯gpu解决方案78.5%，(3)oracle解决方案在每个基准测试中表现得比(1)和(2)更快34.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

20th Annual International Conference on High Performance Computing

自引率

0.00%

发文量