Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors

Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2017-02-04 DOI:10.1145/3026937.3026938

P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez

{"title":"Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors","authors":"P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez","doi":"10.1145/3026937.3026938","DOIUrl":null,"url":null,"abstract":"Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.","PeriodicalId":161677,"journal":{"name":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3026937.3026938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.

查看原文本刊更多论文

非对称多核处理器上对称特征问题的三对角化简

非对称多核处理器(amp)，就像ARM中的那些大处理器一样。LITTLE技术，已被提出作为解决登纳德幂标度定律终结的一种手段。这些体系结构的思想是，在提供高能效的同时，仅激活满足应用程序在执行中所请求的服务质量的核心类型(和数量)。然而，对于密集线性代数问题，性能是至关重要的，要求有效利用AMP中的所有计算资源。为此，我们研究了如何利用ARMv7大处理器的非对称内核。为了获得高性能的简化到三对角线形式，这是解决密集对称特征值问题的重要一步。LAPACK中用于此目的的例程尤其具有挑战性，因为其一半的浮点算术运算(flops)是根据计算绑定的内核进行强制转换的，而其余一半则对应于内存绑定的内核。为了处理这种情况:1)我们利用amp的计算绑定内核的优化实现;2)针对内存约束内核，开发并并行化新的架构感知微内核;3)我们仔细调整在每一步的减少过程中使用的芯的类型和数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores

自引率

0.00%

发文量