Towards the design of an automatically tuned linear algebra library

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing Pub Date : 2002-01-09 DOI:10.1109/EMPDP.2002.994270

J. Cuenca, D. Giménez, José González

{"title":"Towards the design of an automatically tuned linear algebra library","authors":"J. Cuenca, D. Giménez, José González","doi":"10.1109/EMPDP.2002.994270","DOIUrl":null,"url":null,"abstract":"In this work we propose the architecture of an automatically tuned linear algebra library, which is composed by a set of linear algebra routines along with their installation routines. During the installation process on a system, the linear algebra routines will be tuned automatically to the system conditions: hardware characteristics and basic libraries used in the linear algebra routines. The design methodology is analysed with a block LU factorisation. Variants for a sequential and parallel version of this, routine on a logical rectangular mesh of processors are, considered. An analytical model of the algorithm is developed as the basis of our methodology, and the behaviour of the algorithm is analysed with message-passing using MPI on several platforms: Network of SUN workstations, SGI Origin 2000 and IBM SP2, and with, different basic linear algebra libraries: reference BLAS, machine-specific BLAS and ATLAS. The experiments show that it is possible to make a good automatic choice of configurable parameters of the linear algebra routines during the installation process. The average execution time of the linear algebra routine is reduced by about 15% with respect to the non-tuned version.","PeriodicalId":126071,"journal":{"name":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMPDP.2002.994270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

In this work we propose the architecture of an automatically tuned linear algebra library, which is composed by a set of linear algebra routines along with their installation routines. During the installation process on a system, the linear algebra routines will be tuned automatically to the system conditions: hardware characteristics and basic libraries used in the linear algebra routines. The design methodology is analysed with a block LU factorisation. Variants for a sequential and parallel version of this, routine on a logical rectangular mesh of processors are, considered. An analytical model of the algorithm is developed as the basis of our methodology, and the behaviour of the algorithm is analysed with message-passing using MPI on several platforms: Network of SUN workstations, SGI Origin 2000 and IBM SP2, and with, different basic linear algebra libraries: reference BLAS, machine-specific BLAS and ATLAS. The experiments show that it is possible to make a good automatic choice of configurable parameters of the linear algebra routines during the installation process. The average execution time of the linear algebra routine is reduced by about 15% with respect to the non-tuned version.

查看原文本刊更多论文

对线性代数自动调优库的设计

在这项工作中，我们提出了一个自动调优线性代数库的架构，它由一组线性代数例程及其安装例程组成。在系统的安装过程中，线性代数例程将自动调整到系统条件:线性代数例程中使用的硬件特性和基本库。采用块逻辑单元分解法对设计方法进行了分析。考虑了在逻辑矩形处理器网格上该例程的顺序和并行版本的变体。该算法的分析模型是我们方法论的基础，并通过MPI在几个平台上的消息传递来分析算法的行为:SUN工作站网络、SGI Origin 2000和IBM SP2，以及不同的基本线性代数库:参考BLAS、特定于机器的BLAS和ATLAS。实验表明，在安装过程中可以很好地自动选择线性代数例程的可配置参数。相对于未调优版本，线性代数例程的平均执行时间减少了约15%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing

自引率

0.00%

发文量