TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-08-29 DOI:10.1109/HPEC.2019.8916312

T. Schardl, S. Samsi

{"title":"TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir","authors":"T. Schardl, S. Samsi","doi":"10.1109/HPEC.2019.8916312","DOIUrl":null,"url":null,"abstract":"This work introduces TapirXLA, a replacement for TensorFlow’s XLA compiler that embeds recursive fork-join parallelism into XLA’s low-level representation of code. Machine-learning applications employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within high-level TensorFlow operations using Tapir’s representation of fork-join parallelism. Furthermore, TapirXLA exposes to the compiler implementations of linear-algebra library routines whose parallel operations are encoded using Tapir’s representation. We compared the performance of TensorFlow using TapirXLA against TensorFlow using an unmodified XLA compiler. On four neural-network benchmarks, TapirXLA speeds up the parallel running time of the network by a geometric-mean multiplicative factor of 30% to 100%, across four CPU architectures.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

This work introduces TapirXLA, a replacement for TensorFlow’s XLA compiler that embeds recursive fork-join parallelism into XLA’s low-level representation of code. Machine-learning applications employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within high-level TensorFlow operations using Tapir’s representation of fork-join parallelism. Furthermore, TapirXLA exposes to the compiler implementations of linear-algebra library routines whose parallel operations are encoded using Tapir’s representation. We compared the performance of TensorFlow using TapirXLA against TensorFlow using an unmodified XLA compiler. On four neural-network benchmarks, TapirXLA speeds up the parallel running time of the network by a geometric-mean multiplicative factor of 30% to 100%, across four CPU architectures.

查看原文本刊更多论文

TapirXLA:在TensorFlow中使用Tapir在XLA编译器中嵌入Fork-Join并行性

这项工作介绍了TapirXLA，它是TensorFlow的XLA编译器的替代品，它将递归fork-join并行性嵌入到XLA的低级代码表示中。机器学习应用程序采用各种技术来提高性能，包括编译器技术。但是机器学习框架中的编译器缺乏对并行性的深刻理解，导致它们由于缺少并行计算的优化而失去性能。这项工作研究了Tapir，一个编译器中间表示(IR)，它将并行性嵌入到主流编译器IR中，可以被合并到机器学习的编译器中来解决这个问题。TapirXLA修改了TensorFlow中的XLA编译器，以使用Tapir/LLVM编译器来优化低级并行计算。TapirXLA使用Tapir的分叉连接并行性表示对高级TensorFlow操作中的并行性进行编码。此外，TapirXLA向编译器公开了线性代数库例程的实现，这些例程的并行操作是使用Tapir的表示进行编码的。我们比较了使用TapirXLA的TensorFlow和使用未修改的XLA编译器的TensorFlow的性能。在四个神经网络基准测试中，在四个CPU架构中，TapirXLA将网络的并行运行时间提高了30%到100%的几何平均乘法系数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量