针对操作系统密集型工作负载优化指令缓存性能

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI:10.1109/HPCA.1995.386527

J. Torrellas, Chun Xia, Russell L. Daigle

{"title":"针对操作系统密集型工作负载优化指令缓存性能","authors":"J. Torrellas, Chun Xia, Russell L. Daigle","doi":"10.1109/HPCA.1995.386527","DOIUrl":null,"url":null,"abstract":"High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference within popular execution paths dominates instruction cache misses. Based on our observations, we propose an algorithm to expose these localities and reduce interference. For a range of cache sizes, associativities, lines sizes, and other organizations we show that we reduce total instruction miss rates by 31-86% (up to 2.9 absolute points). Using a simple model this corresponds to execution time reductions in the order of 12-26%. In addition, our optimized operating system combines well with optimized or unoptimized applications.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":"{\"title\":\"Optimizing instruction cache performance for operating system intensive workloads\",\"authors\":\"J. Torrellas, Chun Xia, Russell L. Daigle\",\"doi\":\"10.1109/HPCA.1995.386527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference within popular execution paths dominates instruction cache misses. Based on our observations, we propose an algorithm to expose these localities and reduce interference. For a range of cache sizes, associativities, lines sizes, and other organizations we show that we reduce total instruction miss rates by 31-86% (up to 2.9 absolute points). Using a simple model this corresponds to execution time reductions in the order of 12-26%. In addition, our optimized operating system combines well with optimized or unoptimized applications.<<ETX>>\",\"PeriodicalId\":330315,\"journal\":{\"name\":\"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"72\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.1995.386527\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1995.386527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

摘要

高指令缓存命中率是高性能的关键。提高缓存命中率的一种已知技术是使用优化编译器，通过改进代码布局来最小化缓存干扰。然而，这种技术只应用于应用程序代码，尽管有证据表明操作系统经常大量使用缓存，并且使用的模式比应用程序更不统一。因此，目前尚不清楚现有的优化对系统代码的效果如何，以及是否可以找到更好的优化。我们在本文中解决了这个问题。本文详细地描述了操作系统代码的局部性模式，并表明存在大量的局部性。不幸的是，缓存无法提取其中的大部分:很少执行的特殊情况代码破坏了空间局部性，调用例程的迭代很少的循环使循环局部性难以利用，而大量无循环的代码妨碍了时间局部性。结果，流行的执行路径中的干扰主宰了指令缓存丢失。基于我们的观察，我们提出了一种算法来暴露这些位置并减少干扰。对于缓存大小、关联、行大小和其他组织的范围，我们表明我们将总指令失误率降低了31-86%(高达2.9个绝对点数)。使用一个简单的模型，这相当于减少了12-26%的执行时间。此外，我们优化的操作系统与优化或未优化的应用程序结合得很好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing instruction cache performance for operating system intensive workloads

High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference within popular execution paths dominates instruction cache misses. Based on our observations, we propose an algorithm to expose these localities and reduce interference. For a range of cache sizes, associativities, lines sizes, and other organizations we show that we reduce total instruction miss rates by 31-86% (up to 2.9 absolute points). Using a simple model this corresponds to execution time reductions in the order of 12-26%. In addition, our optimized operating system combines well with optimized or unoptimized applications.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture

自引率

0.00%

发文量