通过压缩线性代数缩放机器学习

Ahmed Elgohary, Matthias Boehm, P. Haas, Frederick Reiss, B. Reinwald
{"title":"通过压缩线性代数缩放机器学习","authors":"Ahmed Elgohary, Matthias Boehm, P. Haas, Frederick Reiss, B. Reinwald","doi":"10.1145/3093754.3093765","DOIUrl":null,"url":null,"abstract":"Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable very fast matrix-vector operations on in-memory data. Generalpurpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Compressed linear algebra (CLA) avoids these problems by applying lightweight lossless database compression techniques to matrices and then executing linear algebra operations such as matrix-vector multiplication directly on the compressed representations. The key ingredients are effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Experiments on an initial implementation in SystemML show in-memory operations performance close to the uncompressed case and good compression ratios.We thereby obtain significant end-to-end performance improvements up to 26x or reduced memory requirements.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"1 1","pages":"42-49"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Scaling Machine Learning via Compressed Linear Algebra\",\"authors\":\"Ahmed Elgohary, Matthias Boehm, P. Haas, Frederick Reiss, B. Reinwald\",\"doi\":\"10.1145/3093754.3093765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable very fast matrix-vector operations on in-memory data. Generalpurpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Compressed linear algebra (CLA) avoids these problems by applying lightweight lossless database compression techniques to matrices and then executing linear algebra operations such as matrix-vector multiplication directly on the compressed representations. The key ingredients are effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Experiments on an initial implementation in SystemML show in-memory operations performance close to the uncompressed case and good compression ratios.We thereby obtain significant end-to-end performance improvements up to 26x or reduced memory requirements.\",\"PeriodicalId\":21740,\"journal\":{\"name\":\"SIGMOD Rec.\",\"volume\":\"1 1\",\"pages\":\"42-49\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGMOD Rec.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3093754.3093765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3093754.3093765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

大规模机器学习(ML)算法通常是迭代的,使用重复的只读数据访问和I/Obound矩阵向量乘法来收敛到最优模型。将数据放入单节点或分布式主内存中,并对内存中的数据进行非常快速的矩阵向量操作,这对性能至关重要。通用、重型和轻型压缩技术都在努力实现良好的压缩比和快速的解压缩速度,以实现逐块解压缩操作。压缩线性代数(CLA)通过对矩阵应用轻量级无损数据库压缩技术,然后直接在压缩表示上执行线性代数操作(如矩阵-向量乘法),从而避免了这些问题。其关键成分是有效的列压缩方案、缓存敏感操作和高效的基于采样的压缩算法。在SystemML中初始实现的实验表明,内存操作性能接近未压缩情况和良好的压缩比。因此,我们获得了显著的端到端性能提升,最高可达26倍或内存需求降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scaling Machine Learning via Compressed Linear Algebra
Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable very fast matrix-vector operations on in-memory data. Generalpurpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Compressed linear algebra (CLA) avoids these problems by applying lightweight lossless database compression techniques to matrices and then executing linear algebra operations such as matrix-vector multiplication directly on the compressed representations. The key ingredients are effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Experiments on an initial implementation in SystemML show in-memory operations performance close to the uncompressed case and good compression ratios.We thereby obtain significant end-to-end performance improvements up to 26x or reduced memory requirements.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信