Declarative abstractions for tensor program partitioning

Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming Pub Date : 2020-09-08 DOI:10.1145/3414080.3414105

Dimitrios Vytiniotis

{"title":"Declarative abstractions for tensor program partitioning","authors":"Dimitrios Vytiniotis","doi":"10.1145/3414080.3414105","DOIUrl":null,"url":null,"abstract":"The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.","PeriodicalId":328721,"journal":{"name":"Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3414080.3414105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.

查看原文本刊更多论文

张量程序划分的声明性抽象

最先进的机器学习模型的规模在不断增长;例如，最近由OpenAI训练的语言模型GPT-3包含175B个参数。由于内存限制和可伸缩性限制，这些模型的硬件加速依赖于将它们配置为具有自定义互连网络的加速器设备系统(例如gpu或tpu，甚至是具有快速本地内存的简单计算核心)。这种设置对软件提出了挑战:越来越需要灵活的方法来跨加速器设备系统分发这些多维数组程序(张量程序)。在这次演讲中，我们概述了来自森林砍伐和流融合的想法如何与张量规划和划分领域相关。具体来说，我们将看到数组“构建器”的概念(主要针对代码生成)如何扩展到数组“切片器”。数组切片器与范围对象的代数表示和声明性重写规则一起，可以表达各种不同的和与加速器无关的分布策略。我们将看到如何使用这样的抽象来扩展张量IR，如何通过用户注释或交互策略来驱动分区，以及(作为演示)如何将其降低为SPMD内核的低级可执行数据流图。最后，我们将讨论一些遗留的难题和进一步的转换，这些问题对于在加速器系统上扩展模型是必不可少的，并且声明性编程的思想可以证明是有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming

自引率

0.00%

发文量