Storage and performance optimization of long tail key access in a social network

CloudDP '13 Pub Date : 2013-04-14 DOI:10.1145/2460756.2460757

John Liang, James Luo, Mark Drayton, R. Nishtala, Richard Liu, Nick Hammer, Jason Taylor, Bill Jia

{"title":"Storage and performance optimization of long tail key access in a social network","authors":"John Liang, James Luo, Mark Drayton, R. Nishtala, Richard Liu, Nick Hammer, Jason Taylor, Bill Jia","doi":"10.1145/2460756.2460757","DOIUrl":null,"url":null,"abstract":"In a social network, it is natural to have hot objects such as a celebrity's Facebook page. Duplicating hot object data in each cluster provides quick cache access and avoids stressing a single server's network or CPU resources. But duplicating cold data in each cache cluster consumes significant RAM. A more storage efficient way is to separate hot data from cold data and duplicate only hot data in each cache cluster within a data center. The cold data, or the long tail data, which is accessed much less frequently, has only one copy at a regional cache cluster.\n In this paper, a new sampling technique to capture all accesses to the same sampled keys is created. We then calculate the working set size for each key family for estimating the memory footprint. We introduce an important metric, duplication factor, as the ratio between the sum of each individual cluster's working set size and the regional working set size. We analyze why some key families have a higher duplication factor.\n It is important to separate hot keys and cold keys from the same key family with minimal overhead. We present a novel cache promotion algorithm based on key access probability. We also proposed a probability model based on the binomial distribution to predict the promotion probability with various promotion thresholds.\n Our experiment shows by shrinking the cluster level cache layer and having a fat regional level cache for cold data, we are able to achieve a higher combined cache hit ratio.","PeriodicalId":205924,"journal":{"name":"CloudDP '13","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CloudDP '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2460756.2460757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In a social network, it is natural to have hot objects such as a celebrity's Facebook page. Duplicating hot object data in each cluster provides quick cache access and avoids stressing a single server's network or CPU resources. But duplicating cold data in each cache cluster consumes significant RAM. A more storage efficient way is to separate hot data from cold data and duplicate only hot data in each cache cluster within a data center. The cold data, or the long tail data, which is accessed much less frequently, has only one copy at a regional cache cluster. In this paper, a new sampling technique to capture all accesses to the same sampled keys is created. We then calculate the working set size for each key family for estimating the memory footprint. We introduce an important metric, duplication factor, as the ratio between the sum of each individual cluster's working set size and the regional working set size. We analyze why some key families have a higher duplication factor. It is important to separate hot keys and cold keys from the same key family with minimal overhead. We present a novel cache promotion algorithm based on key access probability. We also proposed a probability model based on the binomial distribution to predict the promotion probability with various promotion thresholds. Our experiment shows by shrinking the cluster level cache layer and having a fat regional level cache for cold data, we are able to achieve a higher combined cache hit ratio.

查看原文本刊更多论文

社交网络中长尾键访问的存储和性能优化

在社交网络中，拥有像名人的Facebook页面这样的热门对象是很自然的。在每个集群中复制热对象数据可以提供快速缓存访问，避免对单个服务器的网络或CPU资源造成压力。但是在每个缓存集群中复制冷数据会消耗大量的RAM。一种存储效率更高的方法是将热数据与冷数据分离，在数据中心内的每个缓存集群中只复制热数据。访问频率低得多的冷数据或长尾数据在区域缓存集群中只有一个副本。在本文中，创建了一种新的采样技术来捕获对相同采样密钥的所有访问。然后，我们计算每个键族的工作集大小，以估计内存占用。我们引入了一个重要的度量，即重复因子，作为每个单独集群的工作集大小和区域工作集大小之间的比率。我们分析了为什么一些关键家族具有较高的重复因子。以最小的开销将热键和冷键从同一个键族中分离出来是很重要的。提出了一种基于键访问概率的缓存提升算法。我们还提出了一个基于二项分布的概率模型来预测不同晋升阈值下的晋升概率。我们的实验表明，通过缩小集群级缓存层并为冷数据提供一个大的区域级缓存，我们能够实现更高的组合缓存命中率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CloudDP '13

自引率

0.00%

发文量