为张量编译器配置交叉注意的图神经网络

arXiv - CS - Performance Pub Date : 2024-05-26 DOI:arxiv-2405.16623

Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh

{"title":"为张量编译器配置交叉注意的图神经网络","authors":"Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh","doi":"arxiv-2405.16623","DOIUrl":null,"url":null,"abstract":"With the recent popularity of neural networks comes the need for efficient\nserving of inference workloads. A neural network inference workload can be\nrepresented as a computational graph with nodes as operators transforming\nmultidimensional tensors. The tensors can be transposed and/or tiled in a\ncombinatorially large number of ways, some configurations leading to\naccelerated inference. We propose TGraph, a neural graph architecture that\nallows screening for fast configurations of the target computational graph,\nthus representing an artificial intelligence (AI) tensor compiler in contrast\nto the traditional heuristics-based compilers. The proposed solution improves\nmean Kendall's $\\tau$ across layout collections of TpuGraphs from 29.8% of the\nreliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission\nreduction associated with our work to be equivalent to over 50% of the total\nhousehold emissions in the areas hosting AI-oriented data centers.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph neural networks with configuration cross-attention for tensor compilers\",\"authors\":\"Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh\",\"doi\":\"arxiv-2405.16623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the recent popularity of neural networks comes the need for efficient\\nserving of inference workloads. A neural network inference workload can be\\nrepresented as a computational graph with nodes as operators transforming\\nmultidimensional tensors. The tensors can be transposed and/or tiled in a\\ncombinatorially large number of ways, some configurations leading to\\naccelerated inference. We propose TGraph, a neural graph architecture that\\nallows screening for fast configurations of the target computational graph,\\nthus representing an artificial intelligence (AI) tensor compiler in contrast\\nto the traditional heuristics-based compilers. The proposed solution improves\\nmean Kendall's $\\\\tau$ across layout collections of TpuGraphs from 29.8% of the\\nreliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission\\nreduction associated with our work to be equivalent to over 50% of the total\\nhousehold emissions in the areas hosting AI-oriented data centers.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"66 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.16623\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.16623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，随着神经网络的普及，人们需要为推理工作负载提供高效服务。神经网络推理工作负载可以表示为一个计算图，节点是变换多维张量的算子。这些张量可以通过大量组合方式进行转置和/或平铺，其中一些配置可以加快推理速度。我们提出的 TGraph 是一种神经图架构，它允许筛选目标计算图的快速配置，从而代表了一种人工智能（AI）张量编译器，与传统的基于启发式的编译器形成鲜明对比。所提出的解决方案提高了 TpuGraph 布局集合的平均 Kendall's $\tau$ 值，从可靠基线的 29.8% 提高到 TGraph 的 67.4%。我们估计，与我们的工作相关的潜在 CO$_2$ 减排量相当于面向人工智能的数据中心所在地区家庭总排放量的 50%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Graph neural networks with configuration cross-attention for tensor compilers

With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量