{"title":"Nanoblock Unroll: Towards the Automatic Generation of Stencil Codes with the Optimal Performance","authors":"T. Muranushi, Keigo Nitadori, J. Makino","doi":"10.1145/2686745.2686746","DOIUrl":null,"url":null,"abstract":"A number of automatic code generation systems have been proposed for stencil computations on modern parallel computers. However, codes they generate are rather inefficient. Typically they achieve < 10% of the peak performance of the platforms. The primary cause for this inefficiency is that the generated codes contain several layers of array indices for array accesses. This layers of indices prevent the compiler from generating efficient assembly codes. In this paper we propose a new approach for the automatic code generation in which the generated code is \"compiler-friendly\", in the sense that the compilers can generate highly optimized assembly codes than typical automatically generated codes. We demonstrate the effectiveness of our approach with a simple example of diffusion equation on a small grid. The measured efficiency can reach 85% of the theoretical peak.","PeriodicalId":367066,"journal":{"name":"Proceedings of the Second Workshop on Optimizing Stencil Computations","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Optimizing Stencil Computations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2686745.2686746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
A number of automatic code generation systems have been proposed for stencil computations on modern parallel computers. However, codes they generate are rather inefficient. Typically they achieve < 10% of the peak performance of the platforms. The primary cause for this inefficiency is that the generated codes contain several layers of array indices for array accesses. This layers of indices prevent the compiler from generating efficient assembly codes. In this paper we propose a new approach for the automatic code generation in which the generated code is "compiler-friendly", in the sense that the compilers can generate highly optimized assembly codes than typical automatically generated codes. We demonstrate the effectiveness of our approach with a simple example of diffusion equation on a small grid. The measured efficiency can reach 85% of the theoretical peak.