Cashmere-2L: software coherent shared memory on a clustered remote-write network

R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, M. Scott
{"title":"Cashmere-2L: software coherent shared memory on a clustered remote-write network","authors":"R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, M. Scott","doi":"10.1145/268998.266675","DOIUrl":null,"url":null,"abstract":"Low-latency remote-write networks, such as DEC's Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a two-level software coherent shared memory system-Cashmere-2L-that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel's remote-write capabilities to implement moderately lazy release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere-2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the need for TLB shootdown. Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network. Cashmere-2L currently runs on an 8-node, 32-processor DEC AlphaServer system. Speedups range from 8 to 31 on 32 processors for our benchmark suite, depending on the application's characteristics. We quantify the importance of our protocol optimizations by comparing performance to that of several alternative protocols that do not share memory in hardware within an SMP, and require more synchronization. In comparison to a one-level protocol that does not share memory in hardware within an SMP, Cashmere-2L improves performance by up to 46%.","PeriodicalId":340271,"journal":{"name":"Proceedings of the sixteenth ACM symposium on Operating systems principles","volume":"272 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"199","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the sixteenth ACM symposium on Operating systems principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/268998.266675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 199

Abstract

Low-latency remote-write networks, such as DEC's Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a two-level software coherent shared memory system-Cashmere-2L-that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel's remote-write capabilities to implement moderately lazy release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere-2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the need for TLB shootdown. Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network. Cashmere-2L currently runs on an 8-node, 32-processor DEC AlphaServer system. Speedups range from 8 to 31 on 32 processors for our benchmark suite, depending on the application's characteristics. We quantify the importance of our protocol optimizations by comparing performance to that of several alternative protocols that do not share memory in hardware within an SMP, and require more synchronization. In comparison to a one-level protocol that does not share memory in hardware within an SMP, Cashmere-2L improves performance by up to 46%.
Cashmere-2L:集群远程写网络上的软件一致共享内存
低延迟远程写网络,比如DEC的Memory Channel,提供了在共享内存多处理器(smp)集群上进行透明、廉价、大规模共享内存并行计算的可能性。挑战在于利用硬件共享内存在SMP内进行共享,并确保只有在集群中的SMP之间主动共享数据时才会产生软件开销。在本文中,我们描述了一个两级软件相干共享存储系统——cashmere - 2l,以应对这一挑战。Cashmere-2L使用硬件在节点内共享内存,同时利用memory Channel的远程写入功能,在多个并发写入器、目录、主节点和节点间页面大小的相干块之间实现适度的延迟发布一致性。Cashmere-2L采用了一种新颖的一致性协议,通过消除全局目录锁和TLB停机的需要来实现高级别异步。通过利用内存通道网络的远程写入功能,可以最大限度地减少远程中断。Cashmere-2L目前运行在一个8节点、32处理器的DEC AlphaServer系统上。我们的基准测试套件在32个处理器上的加速范围从8到31,具体取决于应用程序的特性。我们通过将性能与几个替代协议的性能进行比较来量化协议优化的重要性,这些协议在SMP中不共享硬件中的内存,并且需要更多的同步。与在SMP内不共享硬件内存的单级协议相比,Cashmere-2L将性能提高了46%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信