Moolle: Fan-out control for scalable distributed data stores

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI:10.1109/ICDE.2016.7498325

Sun-Yeong Cho, A. Carter, J. Ehrlich, J. A. Jan

{"title":"Moolle: Fan-out control for scalable distributed data stores","authors":"Sun-Yeong Cho, A. Carter, J. Ehrlich, J. A. Jan","doi":"10.1109/ICDE.2016.7498325","DOIUrl":null,"url":null,"abstract":"Many Online Social Networks horizontally partition data across data stores. This allows the addition of server nodes to increase capacity and throughput. For single key lookup queries such as computing a member's 1st degree connections, clients need to generate only one request to one data store. However, for multi key lookup queries such as computing a 2nd degree network, clients need to generate multiple requests to multiple data stores. The number of requests to fulfill the multi key lookup queries grows in relation to the number of partitions. Increasing the number of server nodes in order to increase capacity also increases the number of requests between the client and data stores. This may increase the latency of the query response time because of network congestion, tail-latency, and CPU bounding. Replication based partitioning strategies can reduce the number of requests in the multi key lookup queries. However, reducing the number of requests in a query can degrade the performance of certain queries where processing, computing, and filtering can be done by the data stores. A better system would provide the capability of controlling the number of requests in a query. This paper presents Moolle, a system of controlling the number of requests in queries to scalable distributed data stores. Moolle has been implemented in the LinkedIn distributed graph service that serves hundreds of thousands of social graph traversal queries per second. We believe that Moolle can be applied to other distributed systems that handle distributed data processing with a high volume of variable-sized requests.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"146 1","pages":"1206-1217"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Many Online Social Networks horizontally partition data across data stores. This allows the addition of server nodes to increase capacity and throughput. For single key lookup queries such as computing a member's 1st degree connections, clients need to generate only one request to one data store. However, for multi key lookup queries such as computing a 2nd degree network, clients need to generate multiple requests to multiple data stores. The number of requests to fulfill the multi key lookup queries grows in relation to the number of partitions. Increasing the number of server nodes in order to increase capacity also increases the number of requests between the client and data stores. This may increase the latency of the query response time because of network congestion, tail-latency, and CPU bounding. Replication based partitioning strategies can reduce the number of requests in the multi key lookup queries. However, reducing the number of requests in a query can degrade the performance of certain queries where processing, computing, and filtering can be done by the data stores. A better system would provide the capability of controlling the number of requests in a query. This paper presents Moolle, a system of controlling the number of requests in queries to scalable distributed data stores. Moolle has been implemented in the LinkedIn distributed graph service that serves hundreds of thousands of social graph traversal queries per second. We believe that Moolle can be applied to other distributed systems that handle distributed data processing with a high volume of variable-sized requests.

查看原文本刊更多论文

Moolle:可扩展分布式数据存储的扇出控制

许多在线社交网络横向划分数据存储。这允许添加服务器节点来增加容量和吞吐量。对于单键查找查询，例如计算成员的一级连接，客户端只需要生成一个到一个数据存储的请求。但是，对于多键查找查询，例如计算二级网络，客户端需要生成对多个数据存储的多个请求。完成多键查找查询的请求数量随着分区数量的增加而增加。为了增加容量而增加服务器节点的数量也会增加客户机和数据存储之间的请求数量。由于网络拥塞、尾部延迟和CPU边界，这可能会增加查询响应时间的延迟。基于复制的分区策略可以减少多键查找查询中的请求数量。但是，减少查询中的请求数量可能会降低某些查询的性能，因为这些查询的处理、计算和过滤可以由数据存储完成。更好的系统应该提供控制查询中请求数量的功能。本文介绍了Moolle，一个控制可扩展分布式数据存储查询请求数量的系统。Moolle已经在LinkedIn分布式图形服务中实现，该服务每秒处理数十万个社交图形遍历查询。我们相信，Moolle可以应用于其他分布式系统，处理具有大量可变大小请求的分布式数据处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 32nd International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量