J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner
{"title":"一种新的用于方阵缓存无关就地转置的三角形空间填充曲线","authors":"J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner","doi":"10.1109/IPDPS54959.2023.00045","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices\",\"authors\":\"J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner\",\"doi\":\"10.1109/IPDPS54959.2023.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.\",\"PeriodicalId\":343684,\"journal\":{\"name\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS54959.2023.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices
This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.