{"title":"使用NiLiCon的容错容器","authors":"Diyu Zhou, Y. Tamir","doi":"10.1109/IPDPS47924.2020.00114","DOIUrl":null,"url":null,"abstract":"Many services deployed in the cloud require high reliability and must thus survive machine failures. Providing such fault tolerance transparently, without requiring application modifications, has motivated extensive research on replicating virtual machines (VMs). Cloud computing typically relies on VMs or containers to provide an isolation and multitenancy layer. Containers have advantages over VMs in smaller size, faster startup, and avoiding the need to manage updates of multiple VMs. This paper reports on the design, implementation, and evaluation of NiLiCon — a transparent container replication mechanism for fault tolerance. To the best of our knowledge, NiLiCon is the first implementation of container replication, demonstrating that it can be used for transparent deployment of critical services in the cloud.NiLiCon is based on high-frequency asynchronous incremental checkpointing to a warm spare, as previously used for VMs. The challenge to accomplishing this is that, compared to VMs, there is much tighter coupling between the container state and the state of the underlying platform. NiLiCon meets this challenge, eliminating the need to deploy services in VMs, with performance overheads that are competitive with those of similar VM replication mechanisms. Specifically, with the seven benchmarks used in the evaluation, the performance overhead of NiLiCon is in the range of 19%-67%. For fail-stop faults, the recovery rate is 100%.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"19 1","pages":"1082-1091"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fault-Tolerant Containers Using NiLiCon\",\"authors\":\"Diyu Zhou, Y. Tamir\",\"doi\":\"10.1109/IPDPS47924.2020.00114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many services deployed in the cloud require high reliability and must thus survive machine failures. Providing such fault tolerance transparently, without requiring application modifications, has motivated extensive research on replicating virtual machines (VMs). Cloud computing typically relies on VMs or containers to provide an isolation and multitenancy layer. Containers have advantages over VMs in smaller size, faster startup, and avoiding the need to manage updates of multiple VMs. This paper reports on the design, implementation, and evaluation of NiLiCon — a transparent container replication mechanism for fault tolerance. To the best of our knowledge, NiLiCon is the first implementation of container replication, demonstrating that it can be used for transparent deployment of critical services in the cloud.NiLiCon is based on high-frequency asynchronous incremental checkpointing to a warm spare, as previously used for VMs. The challenge to accomplishing this is that, compared to VMs, there is much tighter coupling between the container state and the state of the underlying platform. NiLiCon meets this challenge, eliminating the need to deploy services in VMs, with performance overheads that are competitive with those of similar VM replication mechanisms. Specifically, with the seven benchmarks used in the evaluation, the performance overhead of NiLiCon is in the range of 19%-67%. For fail-stop faults, the recovery rate is 100%.\",\"PeriodicalId\":6805,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"19 1\",\"pages\":\"1082-1091\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS47924.2020.00114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Many services deployed in the cloud require high reliability and must thus survive machine failures. Providing such fault tolerance transparently, without requiring application modifications, has motivated extensive research on replicating virtual machines (VMs). Cloud computing typically relies on VMs or containers to provide an isolation and multitenancy layer. Containers have advantages over VMs in smaller size, faster startup, and avoiding the need to manage updates of multiple VMs. This paper reports on the design, implementation, and evaluation of NiLiCon — a transparent container replication mechanism for fault tolerance. To the best of our knowledge, NiLiCon is the first implementation of container replication, demonstrating that it can be used for transparent deployment of critical services in the cloud.NiLiCon is based on high-frequency asynchronous incremental checkpointing to a warm spare, as previously used for VMs. The challenge to accomplishing this is that, compared to VMs, there is much tighter coupling between the container state and the state of the underlying platform. NiLiCon meets this challenge, eliminating the need to deploy services in VMs, with performance overheads that are competitive with those of similar VM replication mechanisms. Specifically, with the seven benchmarks used in the evaluation, the performance overhead of NiLiCon is in the range of 19%-67%. For fail-stop faults, the recovery rate is 100%.