K. Raghavendra, S. Vaddagiri, N. Dadhania, J. Fitzhardinge
{"title":"Paravirtualization for Scalable Kernel-Based Virtual Machine (KVM)","authors":"K. Raghavendra, S. Vaddagiri, N. Dadhania, J. Fitzhardinge","doi":"10.1109/CCEM.2012.6354619","DOIUrl":null,"url":null,"abstract":"In a multi-CPU Virtual Machine(VM), virtual CPUs (VCPUs) are not guaranteed to be scheduled simultaneously. Operating System (OS) constructs, such as busy-wait (mainly spin locks and TLB shoot-down), are written with an assumption of running on bare-metal wastes lot of CPU time, resulting in performance degradation. For e.g., suppose a spin lock holding VCPU is preempted (aka LHP) by the host scheduler, other VCPUs waiting to acquire the same spin lock, waste lot of CPU cycles. Worsening this is the ticket based spin lock implementation, which requires next eligible VCPU to acquire the lock to be running. Similarly, remote TLB flushing API's does a busy wait for other VCPUs to flush the TLB. Above problems result in a higher synchronization latency for the workloads running within a VM. Especially in a massively over-committed environment like cloud, these problems become worse. One of the existing solutions is the hardware supported Pause Loop Exiting (PLE) mechanism which detects such busy-wait constructs inside the VCPU of a VM, and automatically does VCPU exit. Alternate implementation could be gang scheduling which tries to ensure VCPUs of VMs are scheduled simultaneously. But both the implementations suffer from scalability problem and are ill suited for cloud environments. Paravirtualization is the best approach, where the guest OS is made aware that it is running in a virtualized environment, and optimize the busy-wait. Host OS also coordinate with guest to bring further performance benefits. This paper discusses about paravirtualized ticket spin locks where VCPU waiting for spin lock sleeps, and unlocker identifies next eligible lock-holder VCPU and wakes it up. In paravirtualized remote TLB flush, a VCPU does not wait for other VCPUs that are sleeping, but all the sleeping VCPUs flush the TLB when they run next time. Results show that, on a non-PLE machine, these solutions bring huge speed up in over-committed guest scenarios.","PeriodicalId":409273,"journal":{"name":"2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCEM.2012.6354619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
In a multi-CPU Virtual Machine(VM), virtual CPUs (VCPUs) are not guaranteed to be scheduled simultaneously. Operating System (OS) constructs, such as busy-wait (mainly spin locks and TLB shoot-down), are written with an assumption of running on bare-metal wastes lot of CPU time, resulting in performance degradation. For e.g., suppose a spin lock holding VCPU is preempted (aka LHP) by the host scheduler, other VCPUs waiting to acquire the same spin lock, waste lot of CPU cycles. Worsening this is the ticket based spin lock implementation, which requires next eligible VCPU to acquire the lock to be running. Similarly, remote TLB flushing API's does a busy wait for other VCPUs to flush the TLB. Above problems result in a higher synchronization latency for the workloads running within a VM. Especially in a massively over-committed environment like cloud, these problems become worse. One of the existing solutions is the hardware supported Pause Loop Exiting (PLE) mechanism which detects such busy-wait constructs inside the VCPU of a VM, and automatically does VCPU exit. Alternate implementation could be gang scheduling which tries to ensure VCPUs of VMs are scheduled simultaneously. But both the implementations suffer from scalability problem and are ill suited for cloud environments. Paravirtualization is the best approach, where the guest OS is made aware that it is running in a virtualized environment, and optimize the busy-wait. Host OS also coordinate with guest to bring further performance benefits. This paper discusses about paravirtualized ticket spin locks where VCPU waiting for spin lock sleeps, and unlocker identifies next eligible lock-holder VCPU and wakes it up. In paravirtualized remote TLB flush, a VCPU does not wait for other VCPUs that are sleeping, but all the sleeping VCPUs flush the TLB when they run next time. Results show that, on a non-PLE machine, these solutions bring huge speed up in over-committed guest scenarios.