平衡并行科学应用的芯片多处理器缓存分区

Guang Suo
{"title":"平衡并行科学应用的芯片多处理器缓存分区","authors":"Guang Suo","doi":"10.1109/PDCAT.2009.48","DOIUrl":null,"url":null,"abstract":"Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications\",\"authors\":\"Guang Suo\",\"doi\":\"10.1109/PDCAT.2009.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.\",\"PeriodicalId\":312929,\"journal\":{\"name\":\"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2009.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2009.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

如今,越来越多的超级计算机建立在多核处理器和共享缓存上。然而,不同线程或进程对共享缓存的冲突访问成为并行应用程序的性能瓶颈。缓存分区可以用来根据进程的需求为不同的进程独占地分配缓存资源。通过将缓存访问限制在共享缓存的不同私有部分来避免冲突访问。研究了CMP架构下均衡MPI并行应用的共享缓存分区问题,提出了面向性能的缓存分区框架,包括空间级缓存分区(SLCP)和时间级缓存分区(TLCP),并对它们进行了评价。我们基于四核模拟器对SLCP和TLCP进行了评估。实验表明,SLCP和TLCP在IPC吞吐量和丢包率度量方面优于传统的LRU缓存替换策略。具体来说,对于大型工作负载,TLCP的性能比LRU高出20%,平均高出8.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications
Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信