重新审视数据中心的循环平铺:生存和让生存

Proceedings of the 2018 International Conference on Supercomputing Pub Date : 2018-06-12 DOI:10.1145/3205289.3205306

Jiacheng Zhao, Huimin Cui, Yalin Zhang, Jingling Xue, Xiaobing Feng

{"title":"重新审视数据中心的循环平铺:生存和让生存","authors":"Jiacheng Zhao, Huimin Cui, Yalin Zhang, Jingling Xue, Xiaobing Feng","doi":"10.1145/3205289.3205306","DOIUrl":null,"url":null,"abstract":"As DNNs gain popularity in modern datacenters, it becomes imperative to revisit compiler optimizations for DNNs in a colocation scenario. Loop tiling turns out to be the most significant compiler optimization, since DNNs typically apply a series of matrix computations iteratively to a massive amount of data. We introduce a reuse-pattern-centric approach to obtaining a peer-aware TSS (Tile Size Selection) model for a matrix-based application A. Our key insight is that the co-running cache behavior of A (once tiled) can be determined by its data reuse patterns, together with the cache pressure exerted by its co-running peers, without actually the need for analyzing the code of its co-runners. Compared with static tiling (that determines a tile size for A statically without considering its co-running peers), our peer-aware tiling enables compilers to generate either faster peer-aware efficient code for A (by optimizing the performance of A) or faster peer-aware nice code for A (by optimizing the performance of its co-runners). In addition, our peer-aware tiling also enables library developers to improve the performance of library routines (more effectively than static tiling).","PeriodicalId":441217,"journal":{"name":"Proceedings of the 2018 International Conference on Supercomputing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Revisiting Loop Tiling for Datacenters: Live and Let Live\",\"authors\":\"Jiacheng Zhao, Huimin Cui, Yalin Zhang, Jingling Xue, Xiaobing Feng\",\"doi\":\"10.1145/3205289.3205306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As DNNs gain popularity in modern datacenters, it becomes imperative to revisit compiler optimizations for DNNs in a colocation scenario. Loop tiling turns out to be the most significant compiler optimization, since DNNs typically apply a series of matrix computations iteratively to a massive amount of data. We introduce a reuse-pattern-centric approach to obtaining a peer-aware TSS (Tile Size Selection) model for a matrix-based application A. Our key insight is that the co-running cache behavior of A (once tiled) can be determined by its data reuse patterns, together with the cache pressure exerted by its co-running peers, without actually the need for analyzing the code of its co-runners. Compared with static tiling (that determines a tile size for A statically without considering its co-running peers), our peer-aware tiling enables compilers to generate either faster peer-aware efficient code for A (by optimizing the performance of A) or faster peer-aware nice code for A (by optimizing the performance of its co-runners). In addition, our peer-aware tiling also enables library developers to improve the performance of library routines (more effectively than static tiling).\",\"PeriodicalId\":441217,\"journal\":{\"name\":\"Proceedings of the 2018 International Conference on Supercomputing\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3205289.3205306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3205289.3205306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

随着dnn在现代数据中心的普及，在托管场景中重新审视dnn的编译器优化变得势在必行。循环平铺被证明是最重要的编译器优化，因为dnn通常对大量数据迭代地应用一系列矩阵计算。我们引入了一种以重用模式为中心的方法来获得基于矩阵的应用程序a的对等感知TSS (Tile Size Selection)模型。我们的关键见解是，a(一旦被平铺)的共同运行缓存行为可以由其数据重用模式以及共同运行的对等体施加的缓存压力来确定，而实际上不需要分析其共同运行程序的代码。与静态平铺(静态地确定a的平铺大小，而不考虑其共同运行的对等程序)相比，我们的对等感知平铺使编译器能够为a生成更快的、具有对等意识的高效代码(通过优化a的性能)，或者为a生成更快的、具有对等意识的良好代码(通过优化其共同运行程序的性能)。此外，我们的对等感知平铺还使库开发人员能够改进库例程的性能(比静态平铺更有效)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Revisiting Loop Tiling for Datacenters: Live and Let Live

As DNNs gain popularity in modern datacenters, it becomes imperative to revisit compiler optimizations for DNNs in a colocation scenario. Loop tiling turns out to be the most significant compiler optimization, since DNNs typically apply a series of matrix computations iteratively to a massive amount of data. We introduce a reuse-pattern-centric approach to obtaining a peer-aware TSS (Tile Size Selection) model for a matrix-based application A. Our key insight is that the co-running cache behavior of A (once tiled) can be determined by its data reuse patterns, together with the cache pressure exerted by its co-running peers, without actually the need for analyzing the code of its co-runners. Compared with static tiling (that determines a tile size for A statically without considering its co-running peers), our peer-aware tiling enables compilers to generate either faster peer-aware efficient code for A (by optimizing the performance of A) or faster peer-aware nice code for A (by optimizing the performance of its co-runners). In addition, our peer-aware tiling also enables library developers to improve the performance of library routines (more effectively than static tiling).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 International Conference on Supercomputing

自引率

0.00%

发文量