胖子vs小男孩:扩展数据平台中线性代数运算的扩展

Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan
{"title":"胖子vs小男孩:扩展数据平台中线性代数运算的扩展","authors":"Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan","doi":"10.1109/PDSW-DISCS.2016.8","DOIUrl":null,"url":null,"abstract":"Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.","PeriodicalId":375550,"journal":{"name":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","volume":"364 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms\",\"authors\":\"Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan\",\"doi\":\"10.1109/PDSW-DISCS.2016.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.\",\"PeriodicalId\":375550,\"journal\":{\"name\":\"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)\",\"volume\":\"364 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDSW-DISCS.2016.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW-DISCS.2016.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

线性代数运算,如矩阵运算,构成了许多机器学习和其他关键算法的核心。要实现对数百万个数据点的高效处理,这类算法的向上和向外扩展都是非常必要的。为此,我们提出了一种矩阵操作方法,可以在横向扩展数据并行平台(如Apache Spark)中有效地扩展每个节点。具体来说,我们在没有用户干预的情况下为分布式Spark设置中的矩阵乘法启用硬件加速。我们的方法支持密集和稀疏分布矩阵,并通过矩阵密度提供灵活的加速度控制。我们证明了我们的方法对具有多达40亿个元素的大型矩阵的广义矩阵乘法运算的好处。为了将我们的方法与机器学习应用程序的有效性联系起来,我们通过广义矩阵乘法进行了Gramian矩阵计算。我们的实验表明,与目前最先进的Spark MLlib相比,我们的方法实现了2倍以上的性能加速,以及高达96.1%的计算改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms
Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信