{"title":"Reliability algorithms for network swapping systems with page migration","authors":"Ben Mitchell, J. Rosse, T. Newhall","doi":"10.1109/CLUSTR.2004.1392655","DOIUrl":null,"url":null,"abstract":"Summary form only given. Network swapping systems allow individual cluster nodes with over-committed memory to use the idle memory of remote nodes as their backing store, and to swap pages over the network. Without reliability support a single node crash can affect programs running on other nodes by losing their remotely swapped page data. RAID-based (Patterson et al., 1988; Markatos and Dramitinos, 1996) reliability solutions promise the best alternative in terms of flexibility and performance. However, two important features of our network swapping system, Nswap (Newhall et al., 2003), make direct application of RAID-based schemes impossible. First, Nswap adapts to each node's local memory load, adjusting the amount of RAM space it makes available for remote swapping, which results in a variable capacity \"backing store\". Second, Nswap supports migration of remotely swapped pages between cluster nodes, which occurs when a node needs to reclaim some of its RAM from Nswap to use for local processing. Page migration complicates reliability if, for example, two pages in the same parity group end up on the same node. We present novel reliability algorithms that solve these problems. Our Parity algorithm uses dynamic parity group membership to match Nswap's dynamic nature. We show that our algorithms add minimal overhead to remote swapping.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Summary form only given. Network swapping systems allow individual cluster nodes with over-committed memory to use the idle memory of remote nodes as their backing store, and to swap pages over the network. Without reliability support a single node crash can affect programs running on other nodes by losing their remotely swapped page data. RAID-based (Patterson et al., 1988; Markatos and Dramitinos, 1996) reliability solutions promise the best alternative in terms of flexibility and performance. However, two important features of our network swapping system, Nswap (Newhall et al., 2003), make direct application of RAID-based schemes impossible. First, Nswap adapts to each node's local memory load, adjusting the amount of RAM space it makes available for remote swapping, which results in a variable capacity "backing store". Second, Nswap supports migration of remotely swapped pages between cluster nodes, which occurs when a node needs to reclaim some of its RAM from Nswap to use for local processing. Page migration complicates reliability if, for example, two pages in the same parity group end up on the same node. We present novel reliability algorithms that solve these problems. Our Parity algorithm uses dynamic parity group membership to match Nswap's dynamic nature. We show that our algorithms add minimal overhead to remote swapping.
只提供摘要形式。网络交换系统允许具有过度使用内存的单个集群节点使用远程节点的空闲内存作为其后备存储,并在网络上交换页面。如果没有可靠性支持,单个节点崩溃可能会丢失远程交换的页面数据,从而影响在其他节点上运行的程序。基于raid (Patterson et al., 1988;Markatos和Dramitinos, 1996)可靠性解决方案承诺在灵活性和性能方面的最佳选择。然而,我们的网络交换系统swap (Newhall et al., 2003)的两个重要特性使得直接应用基于raid的方案变得不可能。首先,swap适应每个节点的本地内存负载,调整可用于远程交换的RAM空间的数量,从而产生可变容量的“后备存储”。其次,swap支持在集群节点之间迁移远程交换的页面,当节点需要从swap中回收一些RAM用于本地处理时,就会发生这种情况。例如,如果同一奇偶校验组中的两个页面最终位于同一节点上,则页面迁移会使可靠性复杂化。我们提出了新的可靠性算法来解决这些问题。我们的奇偶校验算法使用动态奇偶校验组成员来匹配swap的动态特性。我们展示了我们的算法为远程交换增加了最小的开销。