Cache-Optimal Methods for Bit-Reversals

Zhao Zhang, Xiaodong Zhang
{"title":"Cache-Optimal Methods for Bit-Reversals","authors":"Zhao Zhang, Xiaodong Zhang","doi":"10.1145/331532.331558","DOIUrl":null,"url":null,"abstract":"Bit-reversals are representative and important data reordering operations in many scientific computations. Performance degradation is mainly caused by cache conflict misses. Bit-reversals are often repeatedly used as fundamental subroutines for many scientific programs. Thus, in order to gain the best performance, cache-optimal methods and their implementations should be carefully and precisely done at the programming level. This type of performance programming for some special programs, such as the data reorderings, may significantly outperform an optimization from an automatic tool, such as a compiler. In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations. We evaluate the merits and limits of each technique and their application and architecture-dependent conditions for developing cache-optimal methods. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and TLB cache size and which fully use the available registers are cache-optimal and fast. (2) We show that our padding methods outperform other software oriented methods, and believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and SMP multiprocessors.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 1999 Conference (SC'99)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/331532.331558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Bit-reversals are representative and important data reordering operations in many scientific computations. Performance degradation is mainly caused by cache conflict misses. Bit-reversals are often repeatedly used as fundamental subroutines for many scientific programs. Thus, in order to gain the best performance, cache-optimal methods and their implementations should be carefully and precisely done at the programming level. This type of performance programming for some special programs, such as the data reorderings, may significantly outperform an optimization from an automatic tool, such as a compiler. In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations. We evaluate the merits and limits of each technique and their application and architecture-dependent conditions for developing cache-optimal methods. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and TLB cache size and which fully use the available registers are cache-optimal and fast. (2) We show that our padding methods outperform other software oriented methods, and believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and SMP multiprocessors.
位反转的缓存最优方法
位反转是许多科学计算中具有代表性的重要数据重排序操作。性能下降的主要原因是缓存冲突丢失。位反转经常被反复用作许多科学程序的基本子程序。因此,为了获得最佳性能,缓存最优方法及其实现应该在编程级别仔细而精确地完成。对于某些特殊程序(如数据重排序),这种类型的性能编程可能明显优于自动工具(如编译器)的优化。在本文中,我们研究了使用阻塞、缓冲和填充技术的不同方法,以获得有效的实现。我们评估了每种技术的优点和局限性,以及它们的应用和开发缓存优化方法的体系结构依赖条件。我们在本文中提出了两个贡献:(1)我们的集成阻塞方法匹配缓存关联性和TLB缓存大小,充分利用可用寄存器,是缓存最优的和快速的。(2)我们表明,我们的填充方法优于其他面向软件的方法,并且相信它们在最小化CPU和内存访问周期方面是最快的。由于填充方法几乎与硬件无关,因此可以在许多单处理器工作站和SMP多处理器上广泛使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信