{"title":"Reducing service failures by failure and workload aware load balancing in SaaS clouds","authors":"A. Roy, R. Ganesan, D. Dash, S. Sarkar","doi":"10.1109/DSNW.2013.6615511","DOIUrl":null,"url":null,"abstract":"SLA violations are typically viewed as service failures. If service fails once, it will fail again unless remedial action is taken. In a virtualized environment, a common remedial action is to restart or reboot a virtual machine (VM). In this paper we present, a VM live-migration policy that is aware of SLA threshold violations of workload response time, physical machine (PM) and VM utilization as well as availability violations at the PM and VM. In the migration policy we take into account PM failures and VM (software) failures as well as workload features such as burstiness (coefficient of variation or CoV >1) which calls for caution during the selection of target PM when migrating these workloads. The proposed policy also considers migration of a VM when the utilization of the physical machine hosting the VM approaches its utilization threshold. We propose an algorithm that detects proactive triggers for remedial action, selects a VM (for migration) and also suggests a possible target PM. We show the efficacy of our proposed approach by plotting the decrease in the number of SLA violations in a system using our approach over existing approaches that do not trigger migration in response to non-availability related SLA violations, via discrete event simulation of a relevant case study.","PeriodicalId":377784,"journal":{"name":"2013 43rd Annual IEEE/IFIP Conference on Dependable Systems and Networks Workshop (DSN-W)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 43rd Annual IEEE/IFIP Conference on Dependable Systems and Networks Workshop (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSNW.2013.6615511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
SLA violations are typically viewed as service failures. If service fails once, it will fail again unless remedial action is taken. In a virtualized environment, a common remedial action is to restart or reboot a virtual machine (VM). In this paper we present, a VM live-migration policy that is aware of SLA threshold violations of workload response time, physical machine (PM) and VM utilization as well as availability violations at the PM and VM. In the migration policy we take into account PM failures and VM (software) failures as well as workload features such as burstiness (coefficient of variation or CoV >1) which calls for caution during the selection of target PM when migrating these workloads. The proposed policy also considers migration of a VM when the utilization of the physical machine hosting the VM approaches its utilization threshold. We propose an algorithm that detects proactive triggers for remedial action, selects a VM (for migration) and also suggests a possible target PM. We show the efficacy of our proposed approach by plotting the decrease in the number of SLA violations in a system using our approach over existing approaches that do not trigger migration in response to non-availability related SLA violations, via discrete event simulation of a relevant case study.