使用可视化工具优化离散小波变换的缓存性能

J. Tao, A. Shahbahrami, B. Juurlink, R. Buchty, Wolfgang Karl, S. Vassiliadis
{"title":"使用可视化工具优化离散小波变换的缓存性能","authors":"J. Tao, A. Shahbahrami, B. Juurlink, R. Buchty, Wolfgang Karl, S. Vassiliadis","doi":"10.1109/ISM.2007.12","DOIUrl":null,"url":null,"abstract":"The 2D DWT consists of two 1D DWT in both directions: horizontal filtering processes the rows followed by vertical filtering processes the columns. It is well known that a straightforward implementation of the vertical filtering shows quite different performance with various working set sizes. The only reasonable explanation for this has to be the access behavior of the cache memory. As known, vertical filtering has mapping conflicts in the cache with a working set size that is power of two. However, it is not clear how this conflict forms and whether cache problems exist with other data sizes. Such knowledge is the base for efficient code optimization. In order to acquire this knowledge and to achieve more accurate optimization potentials, we apply a cache visualization tool to examine the runtime cache activities of the vertical implementation. We find that besides mapping conflicts, vertical filtering also shows a large number of capacity misses. More specifically, the visualization tool allows us to detect the parameters related to the strategies. This guarantees the feasibility of the optimization. Our initial experimental results on several different architectures show an up to 215% gain in execution time compared to an already optimized baseline implementation.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Optimizing Cache Performance of the Discrete Wavelet Transform Using a Visualization Tool\",\"authors\":\"J. Tao, A. Shahbahrami, B. Juurlink, R. Buchty, Wolfgang Karl, S. Vassiliadis\",\"doi\":\"10.1109/ISM.2007.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The 2D DWT consists of two 1D DWT in both directions: horizontal filtering processes the rows followed by vertical filtering processes the columns. It is well known that a straightforward implementation of the vertical filtering shows quite different performance with various working set sizes. The only reasonable explanation for this has to be the access behavior of the cache memory. As known, vertical filtering has mapping conflicts in the cache with a working set size that is power of two. However, it is not clear how this conflict forms and whether cache problems exist with other data sizes. Such knowledge is the base for efficient code optimization. In order to acquire this knowledge and to achieve more accurate optimization potentials, we apply a cache visualization tool to examine the runtime cache activities of the vertical implementation. We find that besides mapping conflicts, vertical filtering also shows a large number of capacity misses. More specifically, the visualization tool allows us to detect the parameters related to the strategies. This guarantees the feasibility of the optimization. Our initial experimental results on several different architectures show an up to 215% gain in execution time compared to an already optimized baseline implementation.\",\"PeriodicalId\":129680,\"journal\":{\"name\":\"Ninth IEEE International Symposium on Multimedia (ISM 2007)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ninth IEEE International Symposium on Multimedia (ISM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2007.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2007.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

2D DWT由两个方向上的两个1D DWT组成:水平过滤处理行,然后垂直过滤处理列。众所周知,垂直滤波的直接实现在不同的工作集大小下表现出截然不同的性能。对此唯一合理的解释是缓存内存的访问行为。众所周知,垂直过滤在缓存中具有映射冲突,其工作集大小为2的幂。但是,目前尚不清楚这种冲突是如何形成的,以及其他数据大小是否存在缓存问题。这些知识是高效代码优化的基础。为了获得这些知识并获得更准确的优化潜力,我们应用缓存可视化工具来检查垂直实现的运行时缓存活动。我们发现除了映射冲突外,垂直过滤也会出现大量的容量缺失。更具体地说,可视化工具允许我们检测与策略相关的参数。这保证了优化的可行性。我们在几个不同架构上的初步实验结果显示,与已经优化的基线实现相比,执行时间最多可增加215%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing Cache Performance of the Discrete Wavelet Transform Using a Visualization Tool
The 2D DWT consists of two 1D DWT in both directions: horizontal filtering processes the rows followed by vertical filtering processes the columns. It is well known that a straightforward implementation of the vertical filtering shows quite different performance with various working set sizes. The only reasonable explanation for this has to be the access behavior of the cache memory. As known, vertical filtering has mapping conflicts in the cache with a working set size that is power of two. However, it is not clear how this conflict forms and whether cache problems exist with other data sizes. Such knowledge is the base for efficient code optimization. In order to acquire this knowledge and to achieve more accurate optimization potentials, we apply a cache visualization tool to examine the runtime cache activities of the vertical implementation. We find that besides mapping conflicts, vertical filtering also shows a large number of capacity misses. More specifically, the visualization tool allows us to detect the parameters related to the strategies. This guarantees the feasibility of the optimization. Our initial experimental results on several different architectures show an up to 215% gain in execution time compared to an already optimized baseline implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信