Elvira Teran, Yingying Tian, Zhe Wang, Daniel A. Jiménez
{"title":"Minimal disturbance placement and promotion","authors":"Elvira Teran, Yingying Tian, Zhe Wang, Daniel A. Jiménez","doi":"10.1109/HPCA.2016.7446065","DOIUrl":null,"url":null,"abstract":"Cache replacement policies often order blocks into distinct positions. A block is placed into a set in some initial position. A re-referenced block is promoted into a higher position while other blocks may move into lower positions. A block in the lowest position is a candidate for replacement. Tree-based PseudoLRU is a well-known space-efficient replacement policy based on representing block positions as distinct paths in a binary tree. We find that a placement or promotion for one block often needlessly disturbs the non-promoted blocks. Guided by the principle of minimal disturbance, i.e. that a policy should seek to disturb the order of non-promoted blocks to the smallest extent possible, we develop a simple modification to PseudoLRU resulting in a policy that improves performance over previous techniques while retaining the low cost of PseudoLRU. The result is a minimal disturbance placement and promotion (MDPP) policy. We first give a static formulation of MDPP and show that it provides superior performance to LRU, PseudoLRU and matches performance for SRRIP for both single-threaded and multi-core workloads. We then give a dynamic formulation that uses dead block prediction for placement and bypass and show that it meets or exceeds state-of-the-art performance with lower overhead. For single-threaded workloads, dynamic MDPP matches the 5.9% speedup over LRU of the state-of-the-art policy SHiP. For multi-core workloads, dynamic MDPP gives a normalized weighted speedup of 14.3% over LRU, compared with SHiP that yields a speedup of 12.3% over LRU and requires double the storage overhead per set. We show that minimal disturbance policies can reduce the frequency of a costly read-modify-write cycle for replacement state, making them potentially suitable for future work in DRAM caches.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Cache replacement policies often order blocks into distinct positions. A block is placed into a set in some initial position. A re-referenced block is promoted into a higher position while other blocks may move into lower positions. A block in the lowest position is a candidate for replacement. Tree-based PseudoLRU is a well-known space-efficient replacement policy based on representing block positions as distinct paths in a binary tree. We find that a placement or promotion for one block often needlessly disturbs the non-promoted blocks. Guided by the principle of minimal disturbance, i.e. that a policy should seek to disturb the order of non-promoted blocks to the smallest extent possible, we develop a simple modification to PseudoLRU resulting in a policy that improves performance over previous techniques while retaining the low cost of PseudoLRU. The result is a minimal disturbance placement and promotion (MDPP) policy. We first give a static formulation of MDPP and show that it provides superior performance to LRU, PseudoLRU and matches performance for SRRIP for both single-threaded and multi-core workloads. We then give a dynamic formulation that uses dead block prediction for placement and bypass and show that it meets or exceeds state-of-the-art performance with lower overhead. For single-threaded workloads, dynamic MDPP matches the 5.9% speedup over LRU of the state-of-the-art policy SHiP. For multi-core workloads, dynamic MDPP gives a normalized weighted speedup of 14.3% over LRU, compared with SHiP that yields a speedup of 12.3% over LRU and requires double the storage overhead per set. We show that minimal disturbance policies can reduce the frequency of a costly read-modify-write cycle for replacement state, making them potentially suitable for future work in DRAM caches.