{"title":"一种用于提高缓存性能的数据对齐技术","authors":"P. Panda, Hiroshi Nakamura, N. Dutt, A. Nicolau","doi":"10.1109/ICCD.1997.628925","DOIUrl":null,"url":null,"abstract":"We address the problem of improving the data cache performance of numerical applications-specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"5 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A data alignment technique for improving cache performance\",\"authors\":\"P. Panda, Hiroshi Nakamura, N. Dutt, A. Nicolau\",\"doi\":\"10.1109/ICCD.1997.628925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of improving the data cache performance of numerical applications-specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes.\",\"PeriodicalId\":154864,\"journal\":{\"name\":\"Proceedings International Conference on Computer Design VLSI in Computers and Processors\",\"volume\":\"5 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Conference on Computer Design VLSI in Computers and Processors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.1997.628925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.1997.628925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A data alignment technique for improving cache performance
We address the problem of improving the data cache performance of numerical applications-specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes.