大规模虚拟机热迁移

Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments Pub Date : 2018-03-25 DOI:10.1145/3186411.3186415

Adam Ruprecht, Danny M Jones, Dmitry Shiraev, G. Harmon, Maya Spivak, Michael Krebs, M. Baker-Harvey, Tyler Sanderson

{"title":"大规模虚拟机热迁移","authors":"Adam Ruprecht, Danny M Jones, Dmitry Shiraev, G. Harmon, Maya Spivak, Michael Krebs, M. Baker-Harvey, Tyler Sanderson","doi":"10.1145/3186411.3186415","DOIUrl":null,"url":null,"abstract":"Uninterrupted uptime is a critical aspect of Virtual Machines (VMs) offered by cloud hosting providers. Google's VMs run on top of rapidly changing infrastructure: we regularly update hardware and host software, and we must quickly respond to failing hardware. Frequent change is critical to both development velocity---deploying new versions of services and infrastructure---and the ability to respond rapidly to defects, including critical security fixes. Typically these updates would be disruptive, resulting in VM termination or restart. In this paper we present how we use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in our production fleet, with 50ms median blackout, 300ms 99th percentile blackout.","PeriodicalId":176740,"journal":{"name":"Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"VM Live Migration At Scale\",\"authors\":\"Adam Ruprecht, Danny M Jones, Dmitry Shiraev, G. Harmon, Maya Spivak, Michael Krebs, M. Baker-Harvey, Tyler Sanderson\",\"doi\":\"10.1145/3186411.3186415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Uninterrupted uptime is a critical aspect of Virtual Machines (VMs) offered by cloud hosting providers. Google's VMs run on top of rapidly changing infrastructure: we regularly update hardware and host software, and we must quickly respond to failing hardware. Frequent change is critical to both development velocity---deploying new versions of services and infrastructure---and the ability to respond rapidly to defects, including critical security fixes. Typically these updates would be disruptive, resulting in VM termination or restart. In this paper we present how we use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in our production fleet, with 50ms median blackout, 300ms 99th percentile blackout.\",\"PeriodicalId\":176740,\"journal\":{\"name\":\"Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3186411.3186415\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3186411.3186415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

不间断的正常运行时间是云托管提供商提供的虚拟机(vm)的一个关键方面。谷歌的虚拟机运行在快速变化的基础设施之上:我们定期更新硬件和主机软件，我们必须快速响应硬件故障。频繁的更改对于开发速度(部署新版本的服务和基础设施)和快速响应缺陷(包括关键的安全修复)的能力都是至关重要的。通常，这些更新会中断，导致VM终止或重新启动。在本文中，我们介绍了如何大规模使用虚拟机实时迁移来消除这种中断，同时对客户机的影响最小，在我们的生产舰队中每月执行超过1,000 1,000次迁移，中位数停电时间为50毫秒，99%停电时间为300毫秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VM Live Migration At Scale

Uninterrupted uptime is a critical aspect of Virtual Machines (VMs) offered by cloud hosting providers. Google's VMs run on top of rapidly changing infrastructure: we regularly update hardware and host software, and we must quickly respond to failing hardware. Frequent change is critical to both development velocity---deploying new versions of services and infrastructure---and the ability to respond rapidly to defects, including critical security fixes. Typically these updates would be disruptive, resulting in VM termination or restart. In this paper we present how we use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in our production fleet, with 50ms median blackout, 300ms 99th percentile blackout.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

自引率

0.00%

发文量