A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU

Qi Li, V. Kecman, R. Salman
{"title":"A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU","authors":"Qi Li, V. Kecman, R. Salman","doi":"10.1109/ICMLA.2010.38","DOIUrl":null,"url":null,"abstract":"Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, due to the limited size of device memory, many GPU based algorithms have low capability in solving problems with large datasets. In this paper, a chunking method is proposed to calculate Euclidean distance matrix on large datasets. This is not only designed for scalability in multi-GPU environment but also to maximize the computational capability of each individual GPU device. We first implement a fast GPU algorithm that is suitable for calculating sub matrices of Euclidean distance matrix. Then we utilize a Map-Reduce like framework to split the final distance matrix calculation into many small independent jobs of calculating partial distance matrices, which can be efficiently solved by our GPU algorithm. The framework also dynamically allocates GPU resources to those independent jobs for maximum performance. The experimental results have shown a speed up of 15x on datasets which contain more than half million data points.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Ninth International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2010.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 69

Abstract

Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, due to the limited size of device memory, many GPU based algorithms have low capability in solving problems with large datasets. In this paper, a chunking method is proposed to calculate Euclidean distance matrix on large datasets. This is not only designed for scalability in multi-GPU environment but also to maximize the computational capability of each individual GPU device. We first implement a fast GPU algorithm that is suitable for calculating sub matrices of Euclidean distance matrix. Then we utilize a Map-Reduce like framework to split the final distance matrix calculation into many small independent jobs of calculating partial distance matrices, which can be efficiently solved by our GPU algorithm. The framework also dynamically allocates GPU resources to those independent jobs for maximum performance. The experimental results have shown a speed up of 15x on datasets which contain more than half million data points.
基于多gpu的大数据集欧几里得距离矩阵分块计算方法
计算欧几里得距离矩阵是一项数据密集型操作,对于大型数据集来说,计算变得令人望而却步。近年来图形处理单元(gpu)的发展使其在大量并行处理核心的科学计算问题上产生了卓越的性能。然而,由于设备内存的限制,许多基于GPU的算法在解决大数据集问题时能力较低。本文提出了一种计算大型数据集欧几里得距离矩阵的分块方法。这不仅是为了在多GPU环境下的可扩展性而设计的,也是为了最大限度地提高每个单独GPU设备的计算能力。首先实现了一种适合于计算欧氏距离矩阵子矩阵的快速GPU算法。然后,我们利用类似Map-Reduce的框架将最终的距离矩阵计算分解为许多独立的计算部分距离矩阵的小任务,这些任务可以通过我们的GPU算法有效地解决。该框架还动态地为这些独立的作业分配GPU资源,以获得最大的性能。实验结果表明,在包含超过50万个数据点的数据集上,速度提高了15倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信