Non-serial Polyadic Dynamic Programming on a Data-Parallel Many-core Architecture

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI:10.1109/SAAHPC.2011.25

M. Moazeni, M. Sarrafzadeh, A. Bui

{"title":"Non-serial Polyadic Dynamic Programming on a Data-Parallel Many-core Architecture","authors":"M. Moazeni, M. Sarrafzadeh, A. Bui","doi":"10.1109/SAAHPC.2011.25","DOIUrl":null,"url":null,"abstract":"Dynamic Programming (DP) is a method for efficiently solving a broad range of search and optimization problems. As a result, techniques for managing large-scale DP problems are often critical to the performance of many applications. DP algorithms are often hard to parallelize. In this paper, we address the challenge of exploiting fine grain parallelism on a family of DP algorithms known as non-serial polyadic. We use an abstract formulation of non-serial polyadic DP, derived from RNA secondary structure prediction and matrix parenthesization approaches that are well-known and important problems from this family. We present a load balancing algorithm that achieves the best overall performance with this type of workload on many-core architectures. A divide-and-conquer approach previously used on multi-core architectures is compared against an iterative version. To evaluate these approaches, the algorithm was implemented on three NVIDIA GPUs using CUDA. We achieved up to 10 GFLOP/s performance and up to 228x speedup over the single-threaded CPU implementation. Moreover, the iterative approach results in up to 3.92x speedup over the divide-and-conquer approach.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Symposium on Application Accelerators in High-Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAAHPC.2011.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Dynamic Programming (DP) is a method for efficiently solving a broad range of search and optimization problems. As a result, techniques for managing large-scale DP problems are often critical to the performance of many applications. DP algorithms are often hard to parallelize. In this paper, we address the challenge of exploiting fine grain parallelism on a family of DP algorithms known as non-serial polyadic. We use an abstract formulation of non-serial polyadic DP, derived from RNA secondary structure prediction and matrix parenthesization approaches that are well-known and important problems from this family. We present a load balancing algorithm that achieves the best overall performance with this type of workload on many-core architectures. A divide-and-conquer approach previously used on multi-core architectures is compared against an iterative version. To evaluate these approaches, the algorithm was implemented on three NVIDIA GPUs using CUDA. We achieved up to 10 GFLOP/s performance and up to 228x speedup over the single-threaded CPU implementation. Moreover, the iterative approach results in up to 3.92x speedup over the divide-and-conquer approach.

查看原文本刊更多论文

数据并行多核体系结构上的非串行多进动态规划

动态规划(DP)是一种有效解决各种搜索和优化问题的方法。因此，管理大规模DP问题的技术通常对许多应用程序的性能至关重要。DP算法通常很难并行化。在本文中，我们解决了在一组称为非串行多进的DP算法上开发细粒度并行性的挑战。我们使用了一个抽象的非序列多元DP公式，该公式来源于RNA二级结构预测和矩阵括号化方法，这是该家族中众所周知的重要问题。我们提出了一种负载平衡算法，该算法可以在多核架构上实现这种类型的工作负载的最佳总体性能。将以前在多核体系结构上使用的分而治之的方法与迭代版本进行比较。为了评估这些方法，该算法使用CUDA在三个NVIDIA gpu上实现。与单线程CPU实现相比，我们实现了高达10 GFLOP/s的性能和高达228x的加速。此外，迭代方法比分治方法的速度提高了3.92倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 Symposium on Application Accelerators in High-Performance Computing

自引率

0.00%

发文量