Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854321

Georgios Tournavitis, Björn Franke

{"title":"Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information","authors":"Georgios Tournavitis, Björn Franke","doi":"10.1145/1854273.1854321","DOIUrl":null,"url":null,"abstract":"In recent years multi-core computer systems have left the realm of high-performance computing and virtually all of today's desktop computers and embedded computing systems are equipped with several processing cores. Still, no single parallel programming model has found widespread support and parallel programming remains an art for the majority of application programmers. In addition, there exists a plethora of sequential legacy applications for which automatic parallelization is the only hope to benefit from the increased processing power of modern multi-core systems. In the past automatic parallelization largely focused on data parallelism. In this paper we present a novel approach to extracting and exploiting pipeline parallelism from sequential applications. We use profiling to overcome the limitations of static data and control flow analysis enabling more aggressive parallelization. Our approach is orthogonal to existing automatic parallelization approaches and additional data parallelism may be exploited in the individual pipeline stages. The key contribution of this paper is a whole-program representation that supports profiling, parallelism extraction and exploitation. We demonstrate how this enhances conventional pipeline parallelization by incorporating support for multi-level loops and pipeline stage replication in a uniform and automatic way. We have evaluated our methodology on a set of multimedia and stream processing benchmarks and demonstrate speedups of up to 4.7 on a eight-core Intel Xeon machine.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

In recent years multi-core computer systems have left the realm of high-performance computing and virtually all of today's desktop computers and embedded computing systems are equipped with several processing cores. Still, no single parallel programming model has found widespread support and parallel programming remains an art for the majority of application programmers. In addition, there exists a plethora of sequential legacy applications for which automatic parallelization is the only hope to benefit from the increased processing power of modern multi-core systems. In the past automatic parallelization largely focused on data parallelism. In this paper we present a novel approach to extracting and exploiting pipeline parallelism from sequential applications. We use profiling to overcome the limitations of static data and control flow analysis enabling more aggressive parallelization. Our approach is orthogonal to existing automatic parallelization approaches and additional data parallelism may be exploited in the individual pipeline stages. The key contribution of this paper is a whole-program representation that supports profiling, parallelism extraction and exploitation. We demonstrate how this enhances conventional pipeline parallelization by incorporating support for multi-level loops and pipeline stage replication in a uniform and automatic way. We have evaluated our methodology on a set of multimedia and stream processing benchmarks and demonstrate speedups of up to 4.7 on a eight-core Intel Xeon machine.

查看原文本刊更多论文

基于剖析信息的分层管道并行性半自动提取与开发

近年来，多核计算机系统已经离开了高性能计算的领域，今天几乎所有的台式计算机和嵌入式计算系统都配备了几个处理核心。但是，没有任何一种并行编程模型得到广泛的支持，并行编程对于大多数应用程序程序员来说仍然是一门艺术。此外，存在大量的顺序遗留应用程序，对于这些应用程序，自动并行化是从现代多核系统不断增强的处理能力中获益的唯一希望。过去，自动并行化主要关注数据并行性。本文提出了一种从顺序应用程序中提取和利用管道并行性的新方法。我们使用剖析来克服静态数据和控制流分析的限制，从而实现更积极的并行化。我们的方法与现有的自动并行化方法是正交的，并且可以在各个管道阶段利用额外的数据并行性。本文的主要贡献是支持分析、并行提取和开发的整个程序表示。我们演示了这如何增强传统的管道并行化，包括以统一和自动的方式支持多级循环和管道阶段复制。我们在一组多媒体和流处理基准测试中评估了我们的方法，并在八核Intel Xeon机器上演示了高达4.7的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量