{"title":"基于分块算法的自动调优动态规划","authors":"Jiajia Li, Guangming Tan, Mingyu Chen","doi":"10.1109/ICPADS.2010.117","DOIUrl":null,"url":null,"abstract":"As the complexity of current computer architecture increases, domain-specific program generators are extensively used to implement performance portable libraries. Dynamic programming is a performance-critical kernel in many applications including engineering operations and bioinformatics. In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures. First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling. Further, the parameterized algorithm-by-blocks is cooperative with an auto-tuning framework and leverages a hill climbing algorithm to search the possible best program on a given platform. The experiments on two x86 processors demonstrate that (i) the generated scalar programs improve performance by over 10 times, (ii) the vector programs further speedup the scalar ones by a factor of 4 and 2 for single-precision and double-precision, respectively.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks\",\"authors\":\"Jiajia Li, Guangming Tan, Mingyu Chen\",\"doi\":\"10.1109/ICPADS.2010.117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the complexity of current computer architecture increases, domain-specific program generators are extensively used to implement performance portable libraries. Dynamic programming is a performance-critical kernel in many applications including engineering operations and bioinformatics. In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures. First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling. Further, the parameterized algorithm-by-blocks is cooperative with an auto-tuning framework and leverages a hill climbing algorithm to search the possible best program on a given platform. The experiments on two x86 processors demonstrate that (i) the generated scalar programs improve performance by over 10 times, (ii) the vector programs further speedup the scalar ones by a factor of 4 and 2 for single-precision and double-precision, respectively.\",\"PeriodicalId\":365914,\"journal\":{\"name\":\"2010 IEEE 16th International Conference on Parallel and Distributed Systems\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 16th International Conference on Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.2010.117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2010.117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks
As the complexity of current computer architecture increases, domain-specific program generators are extensively used to implement performance portable libraries. Dynamic programming is a performance-critical kernel in many applications including engineering operations and bioinformatics. In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures. First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling. Further, the parameterized algorithm-by-blocks is cooperative with an auto-tuning framework and leverages a hill climbing algorithm to search the possible best program on a given platform. The experiments on two x86 processors demonstrate that (i) the generated scalar programs improve performance by over 10 times, (ii) the vector programs further speedup the scalar ones by a factor of 4 and 2 for single-precision and double-precision, respectively.