{"title":"Declarative abstractions for tensor program partitioning","authors":"Dimitrios Vytiniotis","doi":"10.1145/3414080.3414105","DOIUrl":null,"url":null,"abstract":"The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.","PeriodicalId":328721,"journal":{"name":"Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3414080.3414105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.