Declarative abstractions for tensor program partitioning

Dimitrios Vytiniotis
{"title":"Declarative abstractions for tensor program partitioning","authors":"Dimitrios Vytiniotis","doi":"10.1145/3414080.3414105","DOIUrl":null,"url":null,"abstract":"The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.","PeriodicalId":328721,"journal":{"name":"Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3414080.3414105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.
张量程序划分的声明性抽象
最先进的机器学习模型的规模在不断增长;例如,最近由OpenAI训练的语言模型GPT-3包含175B个参数。由于内存限制和可伸缩性限制,这些模型的硬件加速依赖于将它们配置为具有自定义互连网络的加速器设备系统(例如gpu或tpu,甚至是具有快速本地内存的简单计算核心)。这种设置对软件提出了挑战:越来越需要灵活的方法来跨加速器设备系统分发这些多维数组程序(张量程序)。在这次演讲中,我们概述了来自森林砍伐和流融合的想法如何与张量规划和划分领域相关。具体来说,我们将看到数组“构建器”的概念(主要针对代码生成)如何扩展到数组“切片器”。数组切片器与范围对象的代数表示和声明性重写规则一起,可以表达各种不同的和与加速器无关的分布策略。我们将看到如何使用这样的抽象来扩展张量IR,如何通过用户注释或交互策略来驱动分区,以及(作为演示)如何将其降低为SPMD内核的低级可执行数据流图。最后,我们将讨论一些遗留的难题和进一步的转换,这些问题对于在加速器系统上扩展模型是必不可少的,并且声明性编程的思想可以证明是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信