{"title":"Achieving Scalable Parallelization for the Hessenberg Factorization","authors":"A. Castaldo, R. Clint Whaley","doi":"10.1109/CLUSTER.2011.16","DOIUrl":null,"url":null,"abstract":"Much of dense linear algebra has been successfully blocked to concentrate the majority of its time in the Level~3 BLAS, which are not only efficient for serial computation, but also scale well for parallelism. For the Hessenberg factorization, which is a critical step in computing the eigenvalues and vectors, however, performance of the best known algorithm is still strongly limited by the memory speed, which does not tend to scale well at all. In this paper we present an adaptation of our Parallel Cache Assignment (PCA) technique to the Hessenberg factorization, and show that it achieves super linear speedup over the corresponding serial algorithm and a more than four-fold speedup over the best known algorithm for small and medium sized problems.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Much of dense linear algebra has been successfully blocked to concentrate the majority of its time in the Level~3 BLAS, which are not only efficient for serial computation, but also scale well for parallelism. For the Hessenberg factorization, which is a critical step in computing the eigenvalues and vectors, however, performance of the best known algorithm is still strongly limited by the memory speed, which does not tend to scale well at all. In this paper we present an adaptation of our Parallel Cache Assignment (PCA) technique to the Hessenberg factorization, and show that it achieves super linear speedup over the corresponding serial algorithm and a more than four-fold speedup over the best known algorithm for small and medium sized problems.