Compiler optimizations for improving data locality

ASPLOS VI Pub Date : 1994-11-01 DOI:10.1145/195473.195557
S. Carr, K. McKinley, C. Tseng
{"title":"Compiler optimizations for improving data locality","authors":"S. Carr, K. McKinley, C. Tseng","doi":"10.1145/195473.195557","DOIUrl":null,"url":null,"abstract":"In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. We demonstrate that these program transformations are useful for optimizing many programs.\nTo validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments with kernels illustrate that our model and algorithm can select and achieve the best performance. For over thirty complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve because benchmark programs typically have high hit rates even for small data caches; however, our optimizations significantly improved several programs.","PeriodicalId":140481,"journal":{"name":"ASPLOS VI","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"332","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS VI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/195473.195557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 332

Abstract

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. We demonstrate that these program transformations are useful for optimizing many programs. To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments with kernels illustrate that our model and algorithm can select and achieve the best performance. For over thirty complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve because benchmark programs typically have high hit rates even for small data caches; however, our optimizations significantly improved several programs.
用于改进数据局部性的编译器优化
在过去的十年中,处理器的速度已经明显快于内存的速度。小而快速的缓存存储器被设计用来克服这种差异,但是它们只有在程序显示数据局域性时才有效。在本文中,我们提出了编译器优化,以提高数据局部性基于一个简单而准确的成本模型。该模型同时计算缓存线的时间和空间重用,以找到理想的循环组织。成本模型驱动由环路置换、环路融合、环路分布和环路反转组成的复合转换的应用。我们证明了这些程序转换对于优化许多程序是有用的。为了验证我们的优化策略,我们实现了我们的算法,并在大量科学程序和内核上运行了实验。核实验表明,我们的模型和算法可以选择并达到最佳性能。对于30多个完整的应用程序,我们执行了原始版本和转换版本,并模拟了缓存命中率。我们收集了有关这些程序固有特征的统计数据,以及我们改进其数据局域性的能力。据我们所知,这些研究在广度和深度上尚属首次。我们发现性能改进很难实现,因为基准程序通常具有高命中率,即使对于小数据缓存也是如此;然而,我们的优化显著改善了几个程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信