Abhishek K. Gupta, Richard P. Spillane, Wenguang Wang, Maxime Austruy, Vahid Fereydouny, C. Karamanolis
{"title":"Hybrid Cloud Storage: Bridging the Gap between Compute Clusters and Cloud Storage","authors":"Abhishek K. Gupta, Richard P. Spillane, Wenguang Wang, Maxime Austruy, Vahid Fereydouny, C. Karamanolis","doi":"10.1145/3139645.3139653","DOIUrl":null,"url":null,"abstract":"Thanks to the compelling economics of public cloud storage, the trend in the IT industry is to move the bulk of analytics and application data to services such as AWS S3 and Google Cloud Storage. At the same time, customers want to continue accessing and analyzing much of that data using applications that run on compute clusters that may reside either on public clouds or on-premise. For VMware customers, those clusters run vSphere (sometimes with vSAN) on-premise and in the future may utilize SDDCaaS. Cloud storage exhibits high latencies and it is not appropriate for direct use by applications. A key challenge for these use cases is determining the subset of the typically huge data sets that need to be moved into the primary storage tier of the compute clusters.\n This paper introduces a novel approach for creating a hybrid cloud storage that allows customers to utilize the fast primary storage of their compute clusters as a caching tier in front of a slow secondary storage tier. This approach can be completely transparent requiring no changes to the application. To achieve this, we extended VDFS [16], a POSIX-compliant scale-out filesystem, with the concept of caching-tier volumes.\n VDFS caching-tier volumes resemble regular file system volumes, but they fault-in data from a cloud storage back-end on first access. Cached data are persisted on fast primary storage, close to the compute cluster, like VMware's vSAN.\n Caching-tier volumes use a write-back approach. The enterprise features of the primary storage ensure the persistence and fault tolerance of new or updated data. Write-back from the primary to cloud storage is managed using an efficient change-tracking mechanism built into VDFS called exo-clones [18].\n This paper outlines the architecture and implementation of caching tier volumes on VDFS and reports on an initial evaluation of the current prototype.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"81 1","pages":"48-53"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGOPS Oper. Syst. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3139645.3139653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Thanks to the compelling economics of public cloud storage, the trend in the IT industry is to move the bulk of analytics and application data to services such as AWS S3 and Google Cloud Storage. At the same time, customers want to continue accessing and analyzing much of that data using applications that run on compute clusters that may reside either on public clouds or on-premise. For VMware customers, those clusters run vSphere (sometimes with vSAN) on-premise and in the future may utilize SDDCaaS. Cloud storage exhibits high latencies and it is not appropriate for direct use by applications. A key challenge for these use cases is determining the subset of the typically huge data sets that need to be moved into the primary storage tier of the compute clusters.
This paper introduces a novel approach for creating a hybrid cloud storage that allows customers to utilize the fast primary storage of their compute clusters as a caching tier in front of a slow secondary storage tier. This approach can be completely transparent requiring no changes to the application. To achieve this, we extended VDFS [16], a POSIX-compliant scale-out filesystem, with the concept of caching-tier volumes.
VDFS caching-tier volumes resemble regular file system volumes, but they fault-in data from a cloud storage back-end on first access. Cached data are persisted on fast primary storage, close to the compute cluster, like VMware's vSAN.
Caching-tier volumes use a write-back approach. The enterprise features of the primary storage ensure the persistence and fault tolerance of new or updated data. Write-back from the primary to cloud storage is managed using an efficient change-tracking mechanism built into VDFS called exo-clones [18].
This paper outlines the architecture and implementation of caching tier volumes on VDFS and reports on an initial evaluation of the current prototype.