TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir

T. Schardl, S. Samsi
{"title":"TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir","authors":"T. Schardl, S. Samsi","doi":"10.1109/HPEC.2019.8916312","DOIUrl":null,"url":null,"abstract":"This work introduces TapirXLA, a replacement for TensorFlow’s XLA compiler that embeds recursive fork-join parallelism into XLA’s low-level representation of code. Machine-learning applications employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within high-level TensorFlow operations using Tapir’s representation of fork-join parallelism. Furthermore, TapirXLA exposes to the compiler implementations of linear-algebra library routines whose parallel operations are encoded using Tapir’s representation. We compared the performance of TensorFlow using TapirXLA against TensorFlow using an unmodified XLA compiler. On four neural-network benchmarks, TapirXLA speeds up the parallel running time of the network by a geometric-mean multiplicative factor of 30% to 100%, across four CPU architectures.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This work introduces TapirXLA, a replacement for TensorFlow’s XLA compiler that embeds recursive fork-join parallelism into XLA’s low-level representation of code. Machine-learning applications employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within high-level TensorFlow operations using Tapir’s representation of fork-join parallelism. Furthermore, TapirXLA exposes to the compiler implementations of linear-algebra library routines whose parallel operations are encoded using Tapir’s representation. We compared the performance of TensorFlow using TapirXLA against TensorFlow using an unmodified XLA compiler. On four neural-network benchmarks, TapirXLA speeds up the parallel running time of the network by a geometric-mean multiplicative factor of 30% to 100%, across four CPU architectures.
TapirXLA:在TensorFlow中使用Tapir在XLA编译器中嵌入Fork-Join并行性
这项工作介绍了TapirXLA,它是TensorFlow的XLA编译器的替代品,它将递归fork-join并行性嵌入到XLA的低级代码表示中。机器学习应用程序采用各种技术来提高性能,包括编译器技术。但是机器学习框架中的编译器缺乏对并行性的深刻理解,导致它们由于缺少并行计算的优化而失去性能。这项工作研究了Tapir,一个编译器中间表示(IR),它将并行性嵌入到主流编译器IR中,可以被合并到机器学习的编译器中来解决这个问题。TapirXLA修改了TensorFlow中的XLA编译器,以使用Tapir/LLVM编译器来优化低级并行计算。TapirXLA使用Tapir的分叉连接并行性表示对高级TensorFlow操作中的并行性进行编码。此外,TapirXLA向编译器公开了线性代数库例程的实现,这些例程的并行操作是使用Tapir的表示进行编码的。我们比较了使用TapirXLA的TensorFlow和使用未修改的XLA编译器的TensorFlow的性能。在四个神经网络基准测试中,在四个CPU架构中,TapirXLA将网络的并行运行时间提高了30%到100%的几何平均乘法系数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信