{"title":"简要公告:具有大于O(1)依赖性的动态规划递归的STAR(时空自适应和简化)算法","authors":"Yuan Tang, Shiyi Wang","doi":"10.1145/3087556.3087593","DOIUrl":null,"url":null,"abstract":"It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of \"Processor-Adaptiveness.\" In contrast to the prior \"Processor-Awareness\", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of \"dynamic load-balance\", which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the \"busy-leaves\" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"483 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Brief Announcement: STAR (Space-Time Adaptive and Reductive) Algorithms for Dynamic Programming Recurrences with more than O(1) Dependency\",\"authors\":\"Yuan Tang, Shiyi Wang\",\"doi\":\"10.1145/3087556.3087593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of \\\"Processor-Adaptiveness.\\\" In contrast to the prior \\\"Processor-Awareness\\\", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of \\\"dynamic load-balance\\\", which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the \\\"busy-leaves\\\" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.\",\"PeriodicalId\":162994,\"journal\":{\"name\":\"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures\",\"volume\":\"483 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3087556.3087593\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3087556.3087593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Brief Announcement: STAR (Space-Time Adaptive and Reductive) Algorithms for Dynamic Programming Recurrences with more than O(1) Dependency
It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of "Processor-Adaptiveness." In contrast to the prior "Processor-Awareness", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of "dynamic load-balance", which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the "busy-leaves" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.