优化张量计算:从应用程序到编译和运行时技术

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI:10.1145/3555041.3589407

Matthias Boehm, Matteo Interlandi, Christopher M. Jermaine

{"title":"优化张量计算:从应用程序到编译和运行时技术","authors":"Matthias Boehm, Matteo Interlandi, Christopher M. Jermaine","doi":"10.1145/3555041.3589407","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) training and scoring fundamentally relies on linear algebra programs and more general tensor computations. Most ML systems utilize distributed parameter servers and similar distribution strategies for mini-batch stochastic gradient descent training. However, many more tasks in the data science and engineering lifecycle can benefit from efficient tensor computations. Examples include primitives for data cleaning, data and model debugging, data augmentation, query processing, numerical simulations, as well as a wide variety of training and scoring algorithms. In this survey tutorial, we first make a case for the importance of optimizing more general tensor computations, and then provide an in-depth survey of existing applications, optimizing compilation techniques, and underlying runtime strategies. Interestingly, there are close connections to data-intensive applications, query rewriting and optimization, as well as query processing and physical design. Our goal for the tutorial is to structure existing work, create common terminology, and identify open research challenges.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques\",\"authors\":\"Matthias Boehm, Matteo Interlandi, Christopher M. Jermaine\",\"doi\":\"10.1145/3555041.3589407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) training and scoring fundamentally relies on linear algebra programs and more general tensor computations. Most ML systems utilize distributed parameter servers and similar distribution strategies for mini-batch stochastic gradient descent training. However, many more tasks in the data science and engineering lifecycle can benefit from efficient tensor computations. Examples include primitives for data cleaning, data and model debugging, data augmentation, query processing, numerical simulations, as well as a wide variety of training and scoring algorithms. In this survey tutorial, we first make a case for the importance of optimizing more general tensor computations, and then provide an in-depth survey of existing applications, optimizing compilation techniques, and underlying runtime strategies. Interestingly, there are close connections to data-intensive applications, query rewriting and optimization, as well as query processing and physical design. Our goal for the tutorial is to structure existing work, create common terminology, and identify open research challenges.\",\"PeriodicalId\":161812,\"journal\":{\"name\":\"Companion of the 2023 International Conference on Management of Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion of the 2023 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555041.3589407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2023 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555041.3589407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习(ML)的训练和评分从根本上依赖于线性代数程序和更一般的张量计算。大多数机器学习系统使用分布式参数服务器和类似的分布策略进行小批量随机梯度下降训练。然而，数据科学和工程生命周期中的更多任务可以从高效张量计算中受益。示例包括用于数据清理、数据和模型调试、数据增强、查询处理、数值模拟以及各种训练和评分算法的原语。在本概览教程中，我们首先说明优化更一般的张量计算的重要性，然后深入介绍现有应用程序、优化编译技术和底层运行时策略。有趣的是，它与数据密集型应用程序、查询重写和优化以及查询处理和物理设计有着密切的联系。本教程的目标是构建现有的工作，创建通用术语，并确定开放的研究挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques

Machine learning (ML) training and scoring fundamentally relies on linear algebra programs and more general tensor computations. Most ML systems utilize distributed parameter servers and similar distribution strategies for mini-batch stochastic gradient descent training. However, many more tasks in the data science and engineering lifecycle can benefit from efficient tensor computations. Examples include primitives for data cleaning, data and model debugging, data augmentation, query processing, numerical simulations, as well as a wide variety of training and scoring algorithms. In this survey tutorial, we first make a case for the importance of optimizing more general tensor computations, and then provide an in-depth survey of existing applications, optimizing compilation techniques, and underlying runtime strategies. Interestingly, there are close connections to data-intensive applications, query rewriting and optimization, as well as query processing and physical design. Our goal for the tutorial is to structure existing work, create common terminology, and identify open research challenges.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion of the 2023 International Conference on Management of Data

自引率

0.00%

发文量