Graph neural networks with configuration cross-attention for tensor compilers

arXiv - CS - Performance Pub Date : 2024-05-26 DOI:arxiv-2405.16623

Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh

引用次数: 0

Abstract

With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

查看原文本刊更多论文

为张量编译器配置交叉注意的图神经网络

近年来，随着神经网络的普及，人们需要为推理工作负载提供高效服务。神经网络推理工作负载可以表示为一个计算图，节点是变换多维张量的算子。这些张量可以通过大量组合方式进行转置和/或平铺，其中一些配置可以加快推理速度。我们提出的 TGraph 是一种神经图架构，它允许筛选目标计算图的快速配置，从而代表了一种人工智能（AI）张量编译器，与传统的基于启发式的编译器形成鲜明对比。所提出的解决方案提高了 TpuGraph 布局集合的平均 Kendall's $\tau$ 值，从可靠基线的 29.8% 提高到 TGraph 的 67.4%。我们估计，与我们的工作相关的潜在 CO$_2$ 减排量相当于面向人工智能的数据中心所在地区家庭总排放量的 50%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量