Dependence-Based Code Transformation for Coarse-Grained Parallelism

Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores Pub Date : 2015-02-08 DOI:10.1145/2723772.2723777

Bo Zhao, Zhen Li, A. Jannesari, F. Wolf, Weiguo Wu

{"title":"Dependence-Based Code Transformation for Coarse-Grained Parallelism","authors":"Bo Zhao, Zhen Li, A. Jannesari, F. Wolf, Weiguo Wu","doi":"10.1145/2723772.2723777","DOIUrl":null,"url":null,"abstract":"Multicore architectures are becoming more common today. Many software products implemented sequentially have failed to exploit the potential parallelism of multicore architectures. Significant re-engineering and refactoring of existing software is needed to support the use of new hardware features. Due to the high cost of manual transformation, an automated approach to transforming existing software and taking advantage of multicore architectures would be highly beneficial. We propose a novel auto-parallelization approach, which integrates data-dependence profiling, task parallelism extraction and source-to-source transformation. Coarse-grained task parallelism is detected based on a concept called Computational Unit(CU). We use dynamic profiling information to gather control- and data-dependences among tasks and generate a task graph. In addition, we develop a source-to-source transformation tool based on LLVM, which can perform high-level code restructuring. It transforms the generated task graph with loop parallelism and task parallelism of sequential code into parallel code using Intel Threading Building Blocks (TBB). We have evaluated NAS Parallel Benchmark applications, three applications from PARSEC benchmark suite, and real world applications. The obtained results confirm that our approach is able to achieve promising performance with minor user interference. The average speedups of loop parallelization and task parallelization are 3.12x and 9.92x respectively.","PeriodicalId":350480,"journal":{"name":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","volume":"574 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723772.2723777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Multicore architectures are becoming more common today. Many software products implemented sequentially have failed to exploit the potential parallelism of multicore architectures. Significant re-engineering and refactoring of existing software is needed to support the use of new hardware features. Due to the high cost of manual transformation, an automated approach to transforming existing software and taking advantage of multicore architectures would be highly beneficial. We propose a novel auto-parallelization approach, which integrates data-dependence profiling, task parallelism extraction and source-to-source transformation. Coarse-grained task parallelism is detected based on a concept called Computational Unit(CU). We use dynamic profiling information to gather control- and data-dependences among tasks and generate a task graph. In addition, we develop a source-to-source transformation tool based on LLVM, which can perform high-level code restructuring. It transforms the generated task graph with loop parallelism and task parallelism of sequential code into parallel code using Intel Threading Building Blocks (TBB). We have evaluated NAS Parallel Benchmark applications, three applications from PARSEC benchmark suite, and real world applications. The obtained results confirm that our approach is able to achieve promising performance with minor user interference. The average speedups of loop parallelization and task parallelization are 3.12x and 9.92x respectively.

查看原文本刊更多论文

基于依赖的粗粒度并行代码转换

如今，多核架构正变得越来越普遍。许多按顺序实现的软件产品未能充分利用多核架构的潜在并行性。为了支持新硬件特性的使用，需要对现有软件进行重大的重新设计和重构。由于手工转换的高成本，转换现有软件和利用多核架构的自动化方法将非常有益。我们提出了一种新的自动并行化方法，该方法集成了数据依赖性分析、任务并行性提取和源到源转换。粗粒度的任务并行性是基于称为计算单元(Computational Unit, CU)的概念来检测的。我们使用动态分析信息来收集任务之间的控制和数据依赖关系，并生成任务图。此外，我们开发了一个基于LLVM的源到源转换工具，它可以执行高级代码重构。它使用英特尔线程构建块(TBB)将生成的具有循环并行性和顺序代码任务并行性的任务图转换为并行代码。我们已经评估了NAS并行基准测试应用程序、来自PARSEC基准测试套件的三个应用程序和实际应用程序。得到的结果证实，我们的方法能够在较小的用户干扰下取得良好的性能。循环并行化和任务并行化的平均加速分别为3.12倍和9.92倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

自引率

0.00%

发文量