P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez
{"title":"Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors","authors":"P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez","doi":"10.1145/3026937.3026938","DOIUrl":null,"url":null,"abstract":"Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.","PeriodicalId":161677,"journal":{"name":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3026937.3026938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.