Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems

Qinglei Cao, Yu Pei, Kadir Akbudak, G. Bosilca, H. Ltaief, D. Keyes, J. Dongarra
{"title":"Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems","authors":"Qinglei Cao, Yu Pei, Kadir Akbudak, G. Bosilca, H. Ltaief, D. Keyes, J. Dongarra","doi":"10.1109/IPDPS49936.2021.00017","DOIUrl":null,"url":null,"abstract":"The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, low-rank matrix approximations—where the main idea consists of exploiting data sparsity, typically by compressing off-diagonal tiles up to an application-specific accuracy threshold—have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires extending PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be made at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of the Mateŕn matrix kernel, which exhibits challenging nonuniform high ranks in off-diagonal tiles. We first provide dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix algebra libraries.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, low-rank matrix approximations—where the main idea consists of exploiting data sparsity, typically by compressing off-diagonal tiles up to an application-specific accuracy threshold—have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires extending PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be made at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of the Mateŕn matrix kernel, which exhibits challenging nonuniform high ranks in off-diagonal tiles. We first provide dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix algebra libraries.
利用PaRSEC运行时支持解决具有挑战性的3D数据稀疏矩阵问题
与动态运行时系统相关联的基于任务的编程模型在解决工作负载不平衡、异构资源或极端并发性等具有挑战性的问题方面受到了广泛的欢迎。在过去十年中,低秩矩阵近似(其主要思想是利用数据稀疏性,通常通过将非对角线块压缩到特定于应用程序的精度阈值)已被用于解决极端规模下的维数诅咒。在本文中,我们通过向运行时传递数据稀疏性的知识,在运行时和线性代数之间建立了一座桥梁。在PaRSEC运行时系统和HiCMA数值库的背景下,我们设计和实现了这种具有高用户生产力的协同方法。这需要用新特性扩展PaRSEC,将等级信息集成到数据流中,以便在运行时做出正确的决策。我们重点研究了解决环境应用中出现的三维数据稀疏协方差矩阵问题的低秩Cholesky分解。特别是,我们采用Mateŕn矩阵核的三维指数模型,它在非对角线瓷砖中显示出具有挑战性的非均匀高秩。我们首先提供由性能模型驱动的动态数据结构管理,以减少额外的浮点操作。接下来,我们通过依赖一个动态内存分配器来优化应用程序的内存占用,并由一个等级感知的数据分布来支持,以处理工作负载不平衡。最后,我们使用核递归公式来缩短关键路径,从而进一步揭示并行性。我们得到的高性能实现在大规模分布式内存系统上比现有的数据稀疏TLR Cholesky分解性能高出7倍,同时将内存占用最小化到44倍。这项多学科的工作强调了需要赋予运行时系统超越其任务调度的原始职责,以服务下一代低秩矩阵代数库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信