Mahdad Davari, Alberto Ros, Erik Hagersten, S. Kaxiras
{"title":"An Efficient, Self-Contained, On-chip Directory: DIR1-SISD","authors":"Mahdad Davari, Alberto Ros, Erik Hagersten, S. Kaxiras","doi":"10.1109/PACT.2015.23","DOIUrl":null,"url":null,"abstract":"Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores and significant effort is invested in reducing its overhead. However, directory area and complexity optimizations are often antithetical to each other. Novel directory-less coherence schemes have been introduced to remove the complexity and cost associated with directories in their entirety. However, such schemes introduce new challenges by transferring some of the directory complexity and functionality to the OS and using the page table and the TLBs to store data classification information. In this work we bridge the gap between directory-based and directory-less coherence schemes and propose a hybrid scheme called DIR1-SISD which employs self-invalidation and self-downgrade as directory policies for the shared entries. DIR1-SISD allows simultaneous optimizations in area and complexity without relying on the OS. DIR1-SISD keeps track of a single -- private -- owner, or allows multiple-readers-multiple-writers to exist simultaneously by transferring the responsibility for their coherence to the corresponding cores. A DIR1-SISD self-contained directory cache has a unique ability to minimize eviction-induced complexities by allowing directory entries to be evicted without maintaining inclusion with the cached data (thus avoiding the complexities of broadcasts) and without the need to have a backing store. Using simulation we show that a small, self-contained, DIR1-SISD cache outperforms a traditional DIR16-NB MESI protocol with a directory cache embedded in the LLC (8% in execution time and 15% in traffic) and, further, outperforms a SISD protocol that relies on the OS to provide a persistent page-based directory (4% in execution time and 20% in traffic).","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Directory-based cache coherence is the de-facto standard for scalable shared-memory multi/many-cores and significant effort is invested in reducing its overhead. However, directory area and complexity optimizations are often antithetical to each other. Novel directory-less coherence schemes have been introduced to remove the complexity and cost associated with directories in their entirety. However, such schemes introduce new challenges by transferring some of the directory complexity and functionality to the OS and using the page table and the TLBs to store data classification information. In this work we bridge the gap between directory-based and directory-less coherence schemes and propose a hybrid scheme called DIR1-SISD which employs self-invalidation and self-downgrade as directory policies for the shared entries. DIR1-SISD allows simultaneous optimizations in area and complexity without relying on the OS. DIR1-SISD keeps track of a single -- private -- owner, or allows multiple-readers-multiple-writers to exist simultaneously by transferring the responsibility for their coherence to the corresponding cores. A DIR1-SISD self-contained directory cache has a unique ability to minimize eviction-induced complexities by allowing directory entries to be evicted without maintaining inclusion with the cached data (thus avoiding the complexities of broadcasts) and without the need to have a backing store. Using simulation we show that a small, self-contained, DIR1-SISD cache outperforms a traditional DIR16-NB MESI protocol with a directory cache embedded in the LLC (8% in execution time and 15% in traffic) and, further, outperforms a SISD protocol that relies on the OS to provide a persistent page-based directory (4% in execution time and 20% in traffic).