Efficient Routing for Cost Effective Scale-Out Data Architectures

Ashwin Narayan, Vuk Markovic, Alejandro Morales
{"title":"Efficient Routing for Cost Effective Scale-Out Data Architectures","authors":"Ashwin Narayan, Vuk Markovic, Alejandro Morales","doi":"10.1109/MASCOTS.2016.29","DOIUrl":null,"url":null,"abstract":"In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which determines the machines containing the requested data. Ideally, to reduce the overall cost of analytics, the smallest set of machines required to satisfy the query should be returned by the router. Mathematically, this can be modeled as the set cover problem, which is NP-hard. Given large number of incoming queries in real-time, it is often impractical to compute set cover for each incoming query to perform routing. In this paper, we propose a novel technique to speedup the routing of a large number of real-time queries while minimizing the number of machines that each query touches (query span). We demonstrate that by analyzing the correlation between known queries and performing query clustering, we can reduce the set cover computation time, thereby significantly speeding up routing of unknown queries. Experiments show that our incremental set cover-based routing is 2.5 times faster and can return on average 50% fewer machines per query when compared to repeated greedy set cover and baseline routing techniques.","PeriodicalId":129389,"journal":{"name":"2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOTS.2016.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which determines the machines containing the requested data. Ideally, to reduce the overall cost of analytics, the smallest set of machines required to satisfy the query should be returned by the router. Mathematically, this can be modeled as the set cover problem, which is NP-hard. Given large number of incoming queries in real-time, it is often impractical to compute set cover for each incoming query to perform routing. In this paper, we propose a novel technique to speedup the routing of a large number of real-time queries while minimizing the number of machines that each query touches (query span). We demonstrate that by analyzing the correlation between known queries and performing query clustering, we can reduce the set cover computation time, thereby significantly speeding up routing of unknown queries. Experiments show that our incremental set cover-based routing is 2.5 times faster and can return on average 50% fewer machines per query when compared to repeated greedy set cover and baseline routing techniques.
高效路由的成本效益的横向扩展数据架构
在大规模数据体系结构中,数据在多台机器上分布和复制。对这种数据架构的查询/任务被发送到路由器,路由器确定包含所请求数据的机器。理想情况下,为了减少分析的总成本,路由器应该返回满足查询所需的最小机器集。在数学上,这可以被建模为集合覆盖问题,这是np困难的。考虑到实时的大量传入查询,为每个传入查询计算集覆盖来执行路由通常是不切实际的。在本文中,我们提出了一种新的技术来加速大量实时查询的路由,同时最小化每个查询所涉及的机器数量(查询跨度)。我们证明,通过分析已知查询之间的相关性并执行查询聚类,我们可以减少集合覆盖的计算时间,从而显著加快未知查询的路由。实验表明,与重复贪婪集覆盖和基线路由技术相比,我们基于增量集覆盖的路由速度快2.5倍,每次查询平均返回的机器数量减少50%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信