{"title":"Optimizing the cost-performance tradeoff for geo-distributed data analytics with uncertain demand","authors":"Wenxin Li, Renhai Xu, Heng Qi, Keqiu Li, Xiaobo Zhou","doi":"10.1109/IWQoS.2017.7969120","DOIUrl":null,"url":null,"abstract":"In the era of global-scale services, analytical queries are performed on datasets that span multiple data centers (DCs). Due to the scarce and expensive inter-DC bandwidth, various methods have been proposed to reduce either the traffic cost or the completion time for those analytics queries. However, current methods make no attempt to maximize the number of successfully served query requests. Moreover, most of them rely on unrealistic assumptions — such as analytical queries are repeated or known in advance. In this paper, we target at characterizing and optimizing the cost-performance tradeoff for geo-distributed data analytics. Our objectives are two-fold: (1) we minimize the inter-DC traffic cost when serving geo-distributed analytics with uncertain query demand, and (2) we maximize the system throughput, in terms of the number of query requests that can be successfully served with guaranteed queuing delay. To achieve these objectives, we take advantage of Lyapunov optimization techniques to design a two-timescale online control framework. Without prior knowledge of future query requests, this framework makes online decisions on input data placement and admission control of query requests. Extensive trace-driven simulation results demonstrate that our framework is capable of reducing inter-DC traffic cost, improving system throughput and guaranteeing a maximum delay for each query request.","PeriodicalId":422861,"journal":{"name":"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS.2017.7969120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In the era of global-scale services, analytical queries are performed on datasets that span multiple data centers (DCs). Due to the scarce and expensive inter-DC bandwidth, various methods have been proposed to reduce either the traffic cost or the completion time for those analytics queries. However, current methods make no attempt to maximize the number of successfully served query requests. Moreover, most of them rely on unrealistic assumptions — such as analytical queries are repeated or known in advance. In this paper, we target at characterizing and optimizing the cost-performance tradeoff for geo-distributed data analytics. Our objectives are two-fold: (1) we minimize the inter-DC traffic cost when serving geo-distributed analytics with uncertain query demand, and (2) we maximize the system throughput, in terms of the number of query requests that can be successfully served with guaranteed queuing delay. To achieve these objectives, we take advantage of Lyapunov optimization techniques to design a two-timescale online control framework. Without prior knowledge of future query requests, this framework makes online decisions on input data placement and admission control of query requests. Extensive trace-driven simulation results demonstrate that our framework is capable of reducing inter-DC traffic cost, improving system throughput and guaranteeing a maximum delay for each query request.