{"title":"平衡并行科学应用的芯片多处理器缓存分区","authors":"Guang Suo","doi":"10.1109/PDCAT.2009.48","DOIUrl":null,"url":null,"abstract":"Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications\",\"authors\":\"Guang Suo\",\"doi\":\"10.1109/PDCAT.2009.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.\",\"PeriodicalId\":312929,\"journal\":{\"name\":\"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2009.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2009.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications
Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.