分布式机器学习和线性代数中矩阵实现的自动优化

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI:10.1145/3448016.3457317

Shangyu Luo, Dimitrije Jankov, Binhang Yuan, C. Jermaine

{"title":"分布式机器学习和线性代数中矩阵实现的自动优化","authors":"Shangyu Luo, Dimitrije Jankov, Binhang Yuan, C. Jermaine","doi":"10.1145/3448016.3457317","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) computations are often expressed using vectors, matrices, or higher-dimensional tensors. Such data structures can have many different implementations, especially in a distributed environment: a matrix could be stored as row or column vectors, tiles of different sizes, or relationally, as a set of (rowIndex, colIndex, value) triples. Many other storage formats are possible. The choice of format can have a profound impact on the performance of a ML computation. In this paper, we propose a framework for automatic optimization of the physical implementation of a complex ML or linear algebra (LA) computation in a distributed environment, develop algorithms for solving this problem, and show, through a prototype on top of a distributed relational database system, that our ideas can radically speed up common ML and LA computations.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra\",\"authors\":\"Shangyu Luo, Dimitrije Jankov, Binhang Yuan, C. Jermaine\",\"doi\":\"10.1145/3448016.3457317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) computations are often expressed using vectors, matrices, or higher-dimensional tensors. Such data structures can have many different implementations, especially in a distributed environment: a matrix could be stored as row or column vectors, tiles of different sizes, or relationally, as a set of (rowIndex, colIndex, value) triples. Many other storage formats are possible. The choice of format can have a profound impact on the performance of a ML computation. In this paper, we propose a framework for automatic optimization of the physical implementation of a complex ML or linear algebra (LA) computation in a distributed environment, develop algorithms for solving this problem, and show, through a prototype on top of a distributed relational database system, that our ideas can radically speed up common ML and LA computations.\",\"PeriodicalId\":360379,\"journal\":{\"name\":\"Proceedings of the 2021 International Conference on Management of Data\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3448016.3457317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448016.3457317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

机器学习(ML)计算通常使用向量、矩阵或高维张量来表示。这样的数据结构可以有许多不同的实现，特别是在分布式环境中:矩阵可以存储为行或列向量、不同大小的块，或者相对地存储为一组(rowIndex、colIndex、value)三元组。可以使用许多其他存储格式。格式的选择会对ML计算的性能产生深远的影响。在本文中，我们提出了一个框架，用于在分布式环境中自动优化复杂ML或线性代数(LA)计算的物理实现，开发了解决该问题的算法，并通过分布式关系数据库系统之上的原型显示，我们的想法可以从根本上加快常见ML和LA计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra

Machine learning (ML) computations are often expressed using vectors, matrices, or higher-dimensional tensors. Such data structures can have many different implementations, especially in a distributed environment: a matrix could be stored as row or column vectors, tiles of different sizes, or relationally, as a set of (rowIndex, colIndex, value) triples. Many other storage formats are possible. The choice of format can have a profound impact on the performance of a ML computation. In this paper, we propose a framework for automatic optimization of the physical implementation of a complex ML or linear algebra (LA) computation in a distributed environment, develop algorithms for solving this problem, and show, through a prototype on top of a distributed relational database system, that our ideas can radically speed up common ML and LA computations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 International Conference on Management of Data

自引率

0.00%

发文量