Zhiguang Tang, Haihang Zhou, Yujin Zhu, Run Tian, Jianguo Yao
{"title":"Quantitative Availability Analysis of Hierarchical Datacenter under Power Oversubscription","authors":"Zhiguang Tang, Haihang Zhou, Yujin Zhu, Run Tian, Jianguo Yao","doi":"10.1109/SMARTCOMP.2017.7947039","DOIUrl":null,"url":null,"abstract":"From the perspective of economic and efficient benefits, modern data center oversubscribes power supplies to deploy as many servers as possible. The oversubscription is based on the varied loads among servers to modulate power demand. Nevertheless, power oversubscription has potential threats to system availability, the data center may collapse as a result of overloading. Current solutions to the oversubscription usually focus on managing the datacenter workload to avoid the peak power demand time in the data center. However, none of the current research considers the influence of the failure of the power or utility components, where the component failure may affect the effectiveness of these strategies. Meanwhile, none of these current research can answer the question that how many servers should be deployed in the data centers under an availability constraint. In this paper, we propose quantitative availability analysis of hierarchical datacenter under power oversubscription. To this end, we use Markov chain and Stochastic Reward Net (SRN) to model the failure and repair processes of data center components. The servers at the bottom level are distributed in two pools: main pool and backup pool, where running servers are in main pool and turned-off servers in backup pool. Migration from backup pool to main pool is conducted once any running server fails. SRNs are implemented to model these two pools, and Markov chain is used to model the components in the upper level. The evaluation is based on the real-life Google and Wikipedia traces. The result shows the relationship between oversubscription and data center availability, which can guide the data center operators to choose the appropriate oversubscription ratio under the availability constraint.","PeriodicalId":193593,"journal":{"name":"2017 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP.2017.7947039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
From the perspective of economic and efficient benefits, modern data center oversubscribes power supplies to deploy as many servers as possible. The oversubscription is based on the varied loads among servers to modulate power demand. Nevertheless, power oversubscription has potential threats to system availability, the data center may collapse as a result of overloading. Current solutions to the oversubscription usually focus on managing the datacenter workload to avoid the peak power demand time in the data center. However, none of the current research considers the influence of the failure of the power or utility components, where the component failure may affect the effectiveness of these strategies. Meanwhile, none of these current research can answer the question that how many servers should be deployed in the data centers under an availability constraint. In this paper, we propose quantitative availability analysis of hierarchical datacenter under power oversubscription. To this end, we use Markov chain and Stochastic Reward Net (SRN) to model the failure and repair processes of data center components. The servers at the bottom level are distributed in two pools: main pool and backup pool, where running servers are in main pool and turned-off servers in backup pool. Migration from backup pool to main pool is conducted once any running server fails. SRNs are implemented to model these two pools, and Markov chain is used to model the components in the upper level. The evaluation is based on the real-life Google and Wikipedia traces. The result shows the relationship between oversubscription and data center availability, which can guide the data center operators to choose the appropriate oversubscription ratio under the availability constraint.