{"title":"An investigation into the impact of the structured QR kernel on the overall performance of the TSQR algorithm","authors":"Takeshi Fukaya","doi":"10.1145/3293320.3293327","DOIUrl":null,"url":null,"abstract":"The TSQR algorithm is a communication-avoiding algorithm for computing the QR factorization of a tall and skinny (TS) matrix. The TSQR algorithm entails repeatedly executing a kernel that computes the QR factorization of a structured matrix. Although a single execution of structured QR requires small computational cost, it is repeated depending on the number of active parallel processes. The complicated computational pattern and small matrix size of structured QR are obstacles to achieving high performance. Thus, the computational cost of structured QR becomes a significant bottleneck in massively parallel computation. In this paper, we focus on the kernel of structured QR and discuss its implementation. We compare several kernels including those provided in LAPACK on modern processors, and investigate the impact of the different structured QR kernels on the overall performance of the TSQR algorithm.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293320.3293327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The TSQR algorithm is a communication-avoiding algorithm for computing the QR factorization of a tall and skinny (TS) matrix. The TSQR algorithm entails repeatedly executing a kernel that computes the QR factorization of a structured matrix. Although a single execution of structured QR requires small computational cost, it is repeated depending on the number of active parallel processes. The complicated computational pattern and small matrix size of structured QR are obstacles to achieving high performance. Thus, the computational cost of structured QR becomes a significant bottleneck in massively parallel computation. In this paper, we focus on the kernel of structured QR and discuss its implementation. We compare several kernels including those provided in LAPACK on modern processors, and investigate the impact of the different structured QR kernels on the overall performance of the TSQR algorithm.