Swarnendu Biswas, Rui Zhang, Michael D. Bond, Brandon Lucia
{"title":"重新考虑对区域冲突例外的支持","authors":"Swarnendu Biswas, Rui Zhang, Michael D. Bond, Brandon Lucia","doi":"10.1109/IPDPS.2019.00116","DOIUrl":null,"url":null,"abstract":"Current shared-memory systems provide well-defined execution semantics only for data-race-free executions. A state-of-the-art technique called Conflict Exceptions (CE) extends M(O) ESI-based coherence to provide defined semantics to all program executions. However, CE incurs significant performance costs because of its need to frequently access metadata in memory. In this work, we explore designs for practical architecture support for region conflict exceptions. First, we propose an on-chip metadata cache called access information memory (AIM) to reduce memory accesses in CE. The extended design is called CE+. In spite of the AIM, CE+ stresses or saturates the on-chip interconnect and the off-chip memory network bandwidth because of its reliance on eager write-invalidation-based coherence. We explore whether detecting conflicts is potentially better suited to cache coherence based on release consistency and self-invalidation, rather than M(O) ESI-based coherence. We realize this insight in a novel architecture design called ARC. Our evaluation shows that CE+ improves the run-time performance and energy usage over CE for several applications across different core counts, but can suffer performance penalties from network saturation. ARC generally outperforms CE, and is competitive with CE+ on average while stressing the on-chip interconnect and off-chip memory network much less, showing that coherence based on release consistency and self-invalidation is well suited to detecting region conflicts.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Rethinking Support for Region Conflict Exceptions\",\"authors\":\"Swarnendu Biswas, Rui Zhang, Michael D. Bond, Brandon Lucia\",\"doi\":\"10.1109/IPDPS.2019.00116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current shared-memory systems provide well-defined execution semantics only for data-race-free executions. A state-of-the-art technique called Conflict Exceptions (CE) extends M(O) ESI-based coherence to provide defined semantics to all program executions. However, CE incurs significant performance costs because of its need to frequently access metadata in memory. In this work, we explore designs for practical architecture support for region conflict exceptions. First, we propose an on-chip metadata cache called access information memory (AIM) to reduce memory accesses in CE. The extended design is called CE+. In spite of the AIM, CE+ stresses or saturates the on-chip interconnect and the off-chip memory network bandwidth because of its reliance on eager write-invalidation-based coherence. We explore whether detecting conflicts is potentially better suited to cache coherence based on release consistency and self-invalidation, rather than M(O) ESI-based coherence. We realize this insight in a novel architecture design called ARC. Our evaluation shows that CE+ improves the run-time performance and energy usage over CE for several applications across different core counts, but can suffer performance penalties from network saturation. ARC generally outperforms CE, and is competitive with CE+ on average while stressing the on-chip interconnect and off-chip memory network much less, showing that coherence based on release consistency and self-invalidation is well suited to detecting region conflicts.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Current shared-memory systems provide well-defined execution semantics only for data-race-free executions. A state-of-the-art technique called Conflict Exceptions (CE) extends M(O) ESI-based coherence to provide defined semantics to all program executions. However, CE incurs significant performance costs because of its need to frequently access metadata in memory. In this work, we explore designs for practical architecture support for region conflict exceptions. First, we propose an on-chip metadata cache called access information memory (AIM) to reduce memory accesses in CE. The extended design is called CE+. In spite of the AIM, CE+ stresses or saturates the on-chip interconnect and the off-chip memory network bandwidth because of its reliance on eager write-invalidation-based coherence. We explore whether detecting conflicts is potentially better suited to cache coherence based on release consistency and self-invalidation, rather than M(O) ESI-based coherence. We realize this insight in a novel architecture design called ARC. Our evaluation shows that CE+ improves the run-time performance and energy usage over CE for several applications across different core counts, but can suffer performance penalties from network saturation. ARC generally outperforms CE, and is competitive with CE+ on average while stressing the on-chip interconnect and off-chip memory network much less, showing that coherence based on release consistency and self-invalidation is well suited to detecting region conflicts.