{"title":"基于分布式数据依赖冲突检测的TLS缓存一致性协议","authors":"X. Lai, Cong Liu, Zhiying Wang, Quanyou Feng","doi":"10.1109/ISDEA.2012.386","DOIUrl":null,"url":null,"abstract":"Current hardware implementations of TLS (thread-level speculation) in both Hydra and Renau's SESC simulator use a global component to check data dependence violations, e.g. L2 Cache or hardware list. Frequent memory accesses cause global component bottlenecks. Implementation and verification of the global component dramatically slows the processor's frequency. In this paper, we propose a cache coherence protocol using a distributed data dependence violation checking mechanism for TLS. The proposed protocol extends the current MESI cache coherence protocol by including several methods to exceed the present limits of centralized violation checking methods. In order not to broadcast every exposed write to the snooping bus, the protocol adds an invalidation vector to each private L1 cache to record threads that violate RAW data dependence. It also adds a versioning priority register that compares data versions. Added to each private L1 cache block is a snooping bit which indicates whether the thread possesses a bus snooping right for the block. The L1 Cache gets a bus snooping right when setting snooping bit. The L1 Cache catches exposed read miss whose address matching cache block address field. If a read miss from a remote core with a lower versioning priority, the L1 Cache updates the invalidation vector according to the core ID on the bus. If TLS runtime is going to commit or invalidate a thread, then L1 Cache invalidates threads whose bits have been set in the invalidation vector and changes any cache blocks to a corresponding non-speculative state. In order to implement the proposed protocol, we modified the SESC simulator, which is an open-source cycle-accurate simulator, to confirm its correctness and analyze its performance.","PeriodicalId":267532,"journal":{"name":"2012 Second International Conference on Intelligent System Design and Engineering Application","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Cache Coherence Protocol Using Distributed Data Dependence Violation Checking in TLS\",\"authors\":\"X. Lai, Cong Liu, Zhiying Wang, Quanyou Feng\",\"doi\":\"10.1109/ISDEA.2012.386\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current hardware implementations of TLS (thread-level speculation) in both Hydra and Renau's SESC simulator use a global component to check data dependence violations, e.g. L2 Cache or hardware list. Frequent memory accesses cause global component bottlenecks. Implementation and verification of the global component dramatically slows the processor's frequency. In this paper, we propose a cache coherence protocol using a distributed data dependence violation checking mechanism for TLS. The proposed protocol extends the current MESI cache coherence protocol by including several methods to exceed the present limits of centralized violation checking methods. In order not to broadcast every exposed write to the snooping bus, the protocol adds an invalidation vector to each private L1 cache to record threads that violate RAW data dependence. It also adds a versioning priority register that compares data versions. Added to each private L1 cache block is a snooping bit which indicates whether the thread possesses a bus snooping right for the block. The L1 Cache gets a bus snooping right when setting snooping bit. The L1 Cache catches exposed read miss whose address matching cache block address field. If a read miss from a remote core with a lower versioning priority, the L1 Cache updates the invalidation vector according to the core ID on the bus. If TLS runtime is going to commit or invalidate a thread, then L1 Cache invalidates threads whose bits have been set in the invalidation vector and changes any cache blocks to a corresponding non-speculative state. In order to implement the proposed protocol, we modified the SESC simulator, which is an open-source cycle-accurate simulator, to confirm its correctness and analyze its performance.\",\"PeriodicalId\":267532,\"journal\":{\"name\":\"2012 Second International Conference on Intelligent System Design and Engineering Application\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Second International Conference on Intelligent System Design and Engineering Application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISDEA.2012.386\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Second International Conference on Intelligent System Design and Engineering Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDEA.2012.386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Cache Coherence Protocol Using Distributed Data Dependence Violation Checking in TLS
Current hardware implementations of TLS (thread-level speculation) in both Hydra and Renau's SESC simulator use a global component to check data dependence violations, e.g. L2 Cache or hardware list. Frequent memory accesses cause global component bottlenecks. Implementation and verification of the global component dramatically slows the processor's frequency. In this paper, we propose a cache coherence protocol using a distributed data dependence violation checking mechanism for TLS. The proposed protocol extends the current MESI cache coherence protocol by including several methods to exceed the present limits of centralized violation checking methods. In order not to broadcast every exposed write to the snooping bus, the protocol adds an invalidation vector to each private L1 cache to record threads that violate RAW data dependence. It also adds a versioning priority register that compares data versions. Added to each private L1 cache block is a snooping bit which indicates whether the thread possesses a bus snooping right for the block. The L1 Cache gets a bus snooping right when setting snooping bit. The L1 Cache catches exposed read miss whose address matching cache block address field. If a read miss from a remote core with a lower versioning priority, the L1 Cache updates the invalidation vector according to the core ID on the bus. If TLS runtime is going to commit or invalidate a thread, then L1 Cache invalidates threads whose bits have been set in the invalidation vector and changes any cache blocks to a corresponding non-speculative state. In order to implement the proposed protocol, we modified the SESC simulator, which is an open-source cycle-accurate simulator, to confirm its correctness and analyze its performance.