分布式多核系统上线性代数算法的动态任务调度

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI:10.1145/1654059.1654079

Fengguang Song, A. YarKhan, J. Dongarra

{"title":"分布式多核系统上线性代数算法的动态任务调度","authors":"Fengguang Song, A. YarKhan, J. Dongarra","doi":"10.1145/1654059.1654079","DOIUrl":null,"url":null,"abstract":"This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interface and computational function as the ScaLAPACK library. Linear algebra programs are written with the task-based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the metric of performance scalability. We propose a distributed algorithm to solve data dependences without process cooperation. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky, LU, and QR factorizations. Our experiments on both shared-memory machines (16, 32 cores) and distributed-memory machines (1024 cores) demonstrate that our runtime system is able to achieve good scalability. Furthermore, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"123","resultStr":"{\"title\":\"Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems\",\"authors\":\"Fengguang Song, A. YarKhan, J. Dongarra\",\"doi\":\"10.1145/1654059.1654079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interface and computational function as the ScaLAPACK library. Linear algebra programs are written with the task-based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the metric of performance scalability. We propose a distributed algorithm to solve data dependences without process cooperation. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky, LU, and QR factorizations. Our experiments on both shared-memory machines (16, 32 cores) and distributed-memory machines (1024 cores) demonstrate that our runtime system is able to achieve good scalability. Furthermore, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time.\",\"PeriodicalId\":371415,\"journal\":{\"name\":\"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"123\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1654059.1654079\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1654059.1654079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 123

摘要

本文提出了一种在多核系统(共享内存或分布式内存)上执行密集线性代数算法的动态任务调度方法。我们使用基于任务的库来取代现有的线性代数子程序，如PBLAS，以透明地提供与ScaLAPACK库相同的接口和计算功能。线性代数程序使用基于任务的库编写，并由动态运行时系统执行。我们主要将运行时系统设计的重点放在性能可伸缩性的度量上。提出了一种分布式算法来解决无进程协作的数据依赖问题。我们已经实现了运行时系统，并将其应用于三种线性代数算法:Cholesky, LU和QR分解。我们在共享内存机器(16,32核)和分布式内存机器(1024核)上的实验表明，我们的运行时系统能够实现良好的可伸缩性。此外，我们提供了分析分析，以说明为什么平铺算法是可扩展的和预期的执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interface and computational function as the ScaLAPACK library. Linear algebra programs are written with the task-based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the metric of performance scalability. We propose a distributed algorithm to solve data dependences without process cooperation. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky, LU, and QR factorizations. Our experiments on both shared-memory machines (16, 32 cores) and distributed-memory machines (1024 cores) demonstrate that our runtime system is able to achieve good scalability. Furthermore, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

自引率

0.00%

发文量