Coordinated Resource Management for Large Scale Interactive Data Query Systems

Wei Yan, Yuan Xue
{"title":"Coordinated Resource Management for Large Scale Interactive Data Query Systems","authors":"Wei Yan, Yuan Xue","doi":"10.1109/CCGrid.2015.149","DOIUrl":null,"url":null,"abstract":"Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"39 7","pages":"677-686"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.
面向大规模交互式数据查询系统的协同资源管理
对海量数据集的交互式临时数据查询最近获得了显著的关注。大规模并行数据查询和分析框架(例如,Dremel, Impala)被构建和部署,以支持在集群环境中对分布式和分区数据进行类似sql的查询。因此,每个查询的执行被转换为一组协调的任务,包括数据检索、中间结果计算和传输以及结果聚合。为了支持并发交互查询的高请求率,集群环境的多种资源(例如带宽、CPU、内存)的协调管理至关重要。在本文中,我们使用基于效用的优化框架来研究这个资源管理问题。我们的目标是优化资源利用率,并在不同类型的查询之间保持公平性。我们提出了一种基于价格的算法来实现这一优化目标。我们在开源的Impala系统中实现了我们的算法,并使用TPC-DS工作负载在集群环境中进行了一组实验。实验结果表明,与简单的公平资源共享机制相比,我们的协调资源管理方案可使总效用至少提高15.4%,与先进先出资源管理机制相比,可使总效用提高63.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信