Brief Announcement: STAR (Space-Time Adaptive and Reductive) Algorithms for Dynamic Programming Recurrences with more than O(1) Dependency

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-07-24 DOI:10.1145/3087556.3087593

Yuan Tang, Shiyi Wang

{"title":"Brief Announcement: STAR (Space-Time Adaptive and Reductive) Algorithms for Dynamic Programming Recurrences with more than O(1) Dependency","authors":"Yuan Tang, Shiyi Wang","doi":"10.1145/3087556.3087593","DOIUrl":null,"url":null,"abstract":"It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of \"Processor-Adaptiveness.\" In contrast to the prior \"Processor-Awareness\", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of \"dynamic load-balance\", which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the \"busy-leaves\" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"483 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3087556.3087593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of "Processor-Adaptiveness." In contrast to the prior "Processor-Awareness", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of "dynamic load-balance", which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the "busy-leaves" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.

查看原文本刊更多论文

简要公告:具有大于O(1)依赖性的动态规划递归的STAR(时空自适应和简化)算法

为了在现代共享内存多核和多核系统上实现高性能，实现现实世界算法的时空平衡非常重要。然而，具有超过O(1)依赖性的一大类动态规划在空间或时间上实现了最优性，而不是两者都实现了最优性。在文献中，这个问题被称为基本时空权衡。我们提出了“处理器自适应”的概念。与之前的“处理器感知”相反，我们的方法没有将问题空间静态地划分到处理器网格，而是使用处理器计数P以缓存无关的方式将空间和缓存需求上界。同时，我们的处理器自适应算法享受“动态负载平衡”的全部好处，这是在共享内存系统上实现令人满意的加速的关键，特别是当问题维数n大于p时。通过利用运行时调度器的“忙叶”属性和程序管理的内存池，结合了堆栈和堆的优点，我们证明了我们的STAR(时空自适应和简化)技术可以帮助这些动态规划实现次线性时间界限，同时保持渐近的工作、空间和缓存最优。本文的关键成果是首先获得次线性O(n3/4 log n)时间和最优O(n3)功的GAP算法;如果我们进一步将算法的空间和缓存需求约束为渐近最优，则在不牺牲工作范围的情况下，时间范围将增加P个因子。如果P = o(n1/4 / log n)，则时间范围保持次线性，并且在实践中可能是时间和空间需求之间更好的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

自引率

0.00%

发文量