张量-关系代数，以及机器学习系统设计中的其他思想

33rd International Conference on Scientific and Statistical Database Management Pub Date : 2021-07-06 DOI:10.1145/3468791.3472262

C. Jermaine

{"title":"张量-关系代数，以及机器学习系统设计中的其他思想","authors":"C. Jermaine","doi":"10.1145/3468791.3472262","DOIUrl":null,"url":null,"abstract":"ACM Reference Format: Chris Jermaine. 2021. The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design. In 33rd International Conference on Scientific and Statistical Database Management, July 06–07, 2021, Tampa, FL, USA. ACM, New York, NY, USA, 1 page. https://doi.org/10.1145/3468791.3472262 Systems for machine learning such as TensorFlow and PyTorch have greatly increased the complexity of the models that can be prototyped, tested, and moved into production, as well as reducing the time and effort required to do this. However, the systems have significant limitations. In these systems, a matrix multiplication (or a 2-D convolution, or any of the operations offered by the system) is a black-box operation that must actually be executed somewhere. As such, if there are multiple GPUs available to execute the multiplication the system cannot “figure out” how to automatically distribute the multiplication over them. It has to run an available matrix multiply somewhere, on some hardware. If there is one GPU available but the inputs are too large to fit in the GPU RAM, the system cannot automatically decompose the operation to perform the computation in stages, moving parts of the matrices on and off of the GPU as needed, to stay within the available memory budget. In this talk, I will argue that relations make a compelling implementation abstraction for building ML systems. Modern ML computations often manipuate matrices and tensors. A tensor can be decomposed into a binary relation between (key, payload) pairs, where key identifies the sub-tensor stored in payload (payload could be a scalar value, but more likely, it is a multidimensional array). Such a simple binary relation allows many (or perhaps all) common ML computations to be expressed relationally. For example, consider two, 2× 104 by 2× 104 matrices, decomposed into relations having 400 tuples each:","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design\",\"authors\":\"C. Jermaine\",\"doi\":\"10.1145/3468791.3472262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ACM Reference Format: Chris Jermaine. 2021. The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design. In 33rd International Conference on Scientific and Statistical Database Management, July 06–07, 2021, Tampa, FL, USA. ACM, New York, NY, USA, 1 page. https://doi.org/10.1145/3468791.3472262 Systems for machine learning such as TensorFlow and PyTorch have greatly increased the complexity of the models that can be prototyped, tested, and moved into production, as well as reducing the time and effort required to do this. However, the systems have significant limitations. In these systems, a matrix multiplication (or a 2-D convolution, or any of the operations offered by the system) is a black-box operation that must actually be executed somewhere. As such, if there are multiple GPUs available to execute the multiplication the system cannot “figure out” how to automatically distribute the multiplication over them. It has to run an available matrix multiply somewhere, on some hardware. If there is one GPU available but the inputs are too large to fit in the GPU RAM, the system cannot automatically decompose the operation to perform the computation in stages, moving parts of the matrices on and off of the GPU as needed, to stay within the available memory budget. In this talk, I will argue that relations make a compelling implementation abstraction for building ML systems. Modern ML computations often manipuate matrices and tensors. A tensor can be decomposed into a binary relation between (key, payload) pairs, where key identifies the sub-tensor stored in payload (payload could be a scalar value, but more likely, it is a multidimensional array). Such a simple binary relation allows many (or perhaps all) common ML computations to be expressed relationally. For example, consider two, 2× 104 by 2× 104 matrices, decomposed into relations having 400 tuples each:\",\"PeriodicalId\":312773,\"journal\":{\"name\":\"33rd International Conference on Scientific and Statistical Database Management\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"33rd International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3468791.3472262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"33rd International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468791.3472262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

ACM参考格式:克里斯杰梅因。2021。张量-关系代数，以及机器学习系统设计中的其他思想。第33届科学与统计数据库管理国际会议，2021年7月06-07日，美国佛罗里达州坦帕市。ACM，纽约，美国，1页。https://doi.org/10.1145/3468791.3472262像TensorFlow和PyTorch这样的机器学习系统大大增加了模型的复杂性，这些模型可以被原型化、测试和投入生产，同时也减少了这样做所需的时间和精力。然而，这些系统有很大的局限性。在这些系统中，矩阵乘法(或二维卷积，或系统提供的任何操作)是必须在某个地方实际执行的黑箱操作。因此，如果有多个gpu可用来执行乘法运算，系统就无法“弄清楚”如何在它们之间自动分配乘法运算。它必须在某个硬件上运行一个可用的矩阵乘法。如果有一个可用的GPU，但输入太大而无法容纳GPU RAM，系统就不能自动分解操作以分阶段执行计算，根据需要在GPU上移动矩阵的部分，以保持在可用的内存预算内。在这次演讲中，我将论证关系为构建机器学习系统提供了一个引人注目的实现抽象。现代机器学习计算经常处理矩阵和张量。张量可以分解为(键，有效载荷)对之间的二进制关系，其中键标识存储在有效载荷中的子张量(有效载荷可以是标量值，但更可能是多维数组)。这样一个简单的二元关系允许许多(或者可能是所有)常见的ML计算用关系表示。例如，考虑两个2x104 × 2x104矩阵，分解为每个具有400个元组的关系:

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design

ACM Reference Format: Chris Jermaine. 2021. The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design. In 33rd International Conference on Scientific and Statistical Database Management, July 06–07, 2021, Tampa, FL, USA. ACM, New York, NY, USA, 1 page. https://doi.org/10.1145/3468791.3472262 Systems for machine learning such as TensorFlow and PyTorch have greatly increased the complexity of the models that can be prototyped, tested, and moved into production, as well as reducing the time and effort required to do this. However, the systems have significant limitations. In these systems, a matrix multiplication (or a 2-D convolution, or any of the operations offered by the system) is a black-box operation that must actually be executed somewhere. As such, if there are multiple GPUs available to execute the multiplication the system cannot “figure out” how to automatically distribute the multiplication over them. It has to run an available matrix multiply somewhere, on some hardware. If there is one GPU available but the inputs are too large to fit in the GPU RAM, the system cannot automatically decompose the operation to perform the computation in stages, moving parts of the matrices on and off of the GPU as needed, to stay within the available memory budget. In this talk, I will argue that relations make a compelling implementation abstraction for building ML systems. Modern ML computations often manipuate matrices and tensors. A tensor can be decomposed into a binary relation between (key, payload) pairs, where key identifies the sub-tensor stored in payload (payload could be a scalar value, but more likely, it is a multidimensional array). Such a simple binary relation allows many (or perhaps all) common ML computations to be expressed relationally. For example, consider two, 2× 104 by 2× 104 matrices, decomposed into relations having 400 tuples each:

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

33rd International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量