Nanoblock Unroll: Towards the Automatic Generation of Stencil Codes with the Optimal Performance

T. Muranushi, Keigo Nitadori, J. Makino
{"title":"Nanoblock Unroll: Towards the Automatic Generation of Stencil Codes with the Optimal Performance","authors":"T. Muranushi, Keigo Nitadori, J. Makino","doi":"10.1145/2686745.2686746","DOIUrl":null,"url":null,"abstract":"A number of automatic code generation systems have been proposed for stencil computations on modern parallel computers. However, codes they generate are rather inefficient. Typically they achieve < 10% of the peak performance of the platforms. The primary cause for this inefficiency is that the generated codes contain several layers of array indices for array accesses. This layers of indices prevent the compiler from generating efficient assembly codes. In this paper we propose a new approach for the automatic code generation in which the generated code is \"compiler-friendly\", in the sense that the compilers can generate highly optimized assembly codes than typical automatically generated codes. We demonstrate the effectiveness of our approach with a simple example of diffusion equation on a small grid. The measured efficiency can reach 85% of the theoretical peak.","PeriodicalId":367066,"journal":{"name":"Proceedings of the Second Workshop on Optimizing Stencil Computations","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Optimizing Stencil Computations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2686745.2686746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

A number of automatic code generation systems have been proposed for stencil computations on modern parallel computers. However, codes they generate are rather inefficient. Typically they achieve < 10% of the peak performance of the platforms. The primary cause for this inefficiency is that the generated codes contain several layers of array indices for array accesses. This layers of indices prevent the compiler from generating efficient assembly codes. In this paper we propose a new approach for the automatic code generation in which the generated code is "compiler-friendly", in the sense that the compilers can generate highly optimized assembly codes than typical automatically generated codes. We demonstrate the effectiveness of our approach with a simple example of diffusion equation on a small grid. The measured efficiency can reach 85% of the theoretical peak.
纳米块展开:实现性能最优的模板代码自动生成
在现代并行计算机上,已经提出了许多用于模板计算的自动代码生成系统。然而,它们生成的代码效率相当低。通常情况下,它们只能达到平台峰值性能的10%以下。这种低效率的主要原因是生成的代码包含用于数组访问的几层数组索引。这些索引层阻止编译器生成有效的汇编代码。在本文中,我们提出了一种新的自动代码生成方法,其中生成的代码是“编译器友好的”,在某种意义上,编译器可以生成比典型的自动生成代码更优化的汇编代码。我们用一个小网格上的扩散方程的简单例子证明了我们方法的有效性。实测效率可达理论峰值的85%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信