{"title":"Index Array Flattening Through Program Transformation","authors":"R. Das, P. Havlak, J. Saltz, K. Kennedy","doi":"10.1145/224170.224420","DOIUrl":null,"url":null,"abstract":"This paper presents techniques for compiling loops with complex, indirect array accesses into loops whose array references have at most one level of indirection. The transformation allows prefetching of array indices for more efficient structuring of communication on distributed-memory machines. It can also improve performance on other architectures by enabling prefetching of data between levels of the memory hierarchy or exploitation of hardware support for vectorized gather/scatter. Our techniques are implemented in a compiler for Fortran D and execution speed improvements are given for multiprocessor and vector machines.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"02 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37
Abstract
This paper presents techniques for compiling loops with complex, indirect array accesses into loops whose array references have at most one level of indirection. The transformation allows prefetching of array indices for more efficient structuring of communication on distributed-memory machines. It can also improve performance on other architectures by enabling prefetching of data between levels of the memory hierarchy or exploitation of hardware support for vectorized gather/scatter. Our techniques are implemented in a compiler for Fortran D and execution speed improvements are given for multiprocessor and vector machines.