{"title":"一种动态规划方法优化Householder QR分解的阻塞策略","authors":"Takeshi Fukaya, Yusaku Yamamoto, Shaoliang Zhang","doi":"10.1109/CLUSTR.2008.4663801","DOIUrl":null,"url":null,"abstract":"In this paper, we present a new approach to optimizing the blocking strategy for the householder QR decomposition. In high performance implementations of the householder QR algorithm, it is common to use a blocking technique for the efficient use of the cache memory. There are several well known blocking strategies like the fixed-size blocking and recursive blocking, and usually their parameters such as the block size and the recursion level are tuned according to the target machine and the problem size. However, strategies generated with this kind of parameter optimization constitute only a small fraction of all possible blocking strategies. Given the complex performance characteristics of modern microprocessors, non-standard strategies may prove effective on some machines. Considering this situation, we first propose a new universal model that can express a far larger class of blocking strategies than has been considered so far. Next, we give an algorithm to find a near-optimal strategy from this class using dynamic programming. As a result of this approach, we found an effective blocking strategy that has never been reported. Performance evaluation on the Opteron and Core2 processors show that our strategy achieves about 1.2 times speedup over recursive blocking when computing the QR decomposition of a 6000 times 6000 matrix.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A dynamic programming approach to optimizing the blocking strategy for the Householder QR decomposition\",\"authors\":\"Takeshi Fukaya, Yusaku Yamamoto, Shaoliang Zhang\",\"doi\":\"10.1109/CLUSTR.2008.4663801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a new approach to optimizing the blocking strategy for the householder QR decomposition. In high performance implementations of the householder QR algorithm, it is common to use a blocking technique for the efficient use of the cache memory. There are several well known blocking strategies like the fixed-size blocking and recursive blocking, and usually their parameters such as the block size and the recursion level are tuned according to the target machine and the problem size. However, strategies generated with this kind of parameter optimization constitute only a small fraction of all possible blocking strategies. Given the complex performance characteristics of modern microprocessors, non-standard strategies may prove effective on some machines. Considering this situation, we first propose a new universal model that can express a far larger class of blocking strategies than has been considered so far. Next, we give an algorithm to find a near-optimal strategy from this class using dynamic programming. As a result of this approach, we found an effective blocking strategy that has never been reported. Performance evaluation on the Opteron and Core2 processors show that our strategy achieves about 1.2 times speedup over recursive blocking when computing the QR decomposition of a 6000 times 6000 matrix.\",\"PeriodicalId\":198768,\"journal\":{\"name\":\"2008 IEEE International Conference on Cluster Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2008.4663801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2008.4663801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A dynamic programming approach to optimizing the blocking strategy for the Householder QR decomposition
In this paper, we present a new approach to optimizing the blocking strategy for the householder QR decomposition. In high performance implementations of the householder QR algorithm, it is common to use a blocking technique for the efficient use of the cache memory. There are several well known blocking strategies like the fixed-size blocking and recursive blocking, and usually their parameters such as the block size and the recursion level are tuned according to the target machine and the problem size. However, strategies generated with this kind of parameter optimization constitute only a small fraction of all possible blocking strategies. Given the complex performance characteristics of modern microprocessors, non-standard strategies may prove effective on some machines. Considering this situation, we first propose a new universal model that can express a far larger class of blocking strategies than has been considered so far. Next, we give an algorithm to find a near-optimal strategy from this class using dynamic programming. As a result of this approach, we found an effective blocking strategy that has never been reported. Performance evaluation on the Opteron and Core2 processors show that our strategy achieves about 1.2 times speedup over recursive blocking when computing the QR decomposition of a 6000 times 6000 matrix.