Kubestorage: A Cloud Native Storage Engine for Massive Small Files

2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC) Pub Date : 2019-10-01 DOI:10.1109/BESC48373.2019.8962995

Fuxin Liu, Jingwei Li, Yihong Wang, Lin Li

{"title":"Kubestorage: A Cloud Native Storage Engine for Massive Small Files","authors":"Fuxin Liu, Jingwei Li, Yihong Wang, Lin Li","doi":"10.1109/BESC48373.2019.8962995","DOIUrl":null,"url":null,"abstract":"Cloud Native, the emerging computing infrastructure has become a new trend for cloud computing, especially after the development of containerization technology such as docker and LXD, and the orchestration system for them like Kubernetes and Swarm. With the growing popularity of Cloud Native, the following problems have been raised: (i) most Cloud Native applications were designed for making full use of the cloud platform, but their file storage has not been completely optimized for adapting it. (ii) the traditional file system is designed as a utility for storing and retrieving files, usually built into the kernel of the operating systems. But when placing it to a large-scale condition, like a network storage server shared by thousands of computing instances, and stores millions of files, it will be slow and even unstable. (iii) most storage solutions use metadata for faster tracking of files, but the metadata itself will take up a lot of space, and the capacity of it is usually limited. If the file system store metadata directly into hard disk without caching, the tracking of massive small files will be a lot slower. (iv) The traditional object storage solution can't provide enough features to make itself more practical on the cloud such as caching and auto replication. This paper proposes a new storage engine based on the well-known Haystack storage engine, optimized in terms of service discovery and Automated fault tolerance, make it more suitable for Cloud Native infrastructure, deployment and applications. We use the object storage model to solve the large and high-frequency file storage needs, offering a simple and unified set of APIs for application to access. We also take advantage of Kubernetes' sophisticated and automated toolchains to make cloud storage easier to deploy, more flexible to scale, and more stable to run.","PeriodicalId":190867,"journal":{"name":"2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BESC48373.2019.8962995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Cloud Native, the emerging computing infrastructure has become a new trend for cloud computing, especially after the development of containerization technology such as docker and LXD, and the orchestration system for them like Kubernetes and Swarm. With the growing popularity of Cloud Native, the following problems have been raised: (i) most Cloud Native applications were designed for making full use of the cloud platform, but their file storage has not been completely optimized for adapting it. (ii) the traditional file system is designed as a utility for storing and retrieving files, usually built into the kernel of the operating systems. But when placing it to a large-scale condition, like a network storage server shared by thousands of computing instances, and stores millions of files, it will be slow and even unstable. (iii) most storage solutions use metadata for faster tracking of files, but the metadata itself will take up a lot of space, and the capacity of it is usually limited. If the file system store metadata directly into hard disk without caching, the tracking of massive small files will be a lot slower. (iv) The traditional object storage solution can't provide enough features to make itself more practical on the cloud such as caching and auto replication. This paper proposes a new storage engine based on the well-known Haystack storage engine, optimized in terms of service discovery and Automated fault tolerance, make it more suitable for Cloud Native infrastructure, deployment and applications. We use the object storage model to solve the large and high-frequency file storage needs, offering a simple and unified set of APIs for application to access. We also take advantage of Kubernetes' sophisticated and automated toolchains to make cloud storage easier to deploy, more flexible to scale, and more stable to run.

查看原文本刊更多论文

Kubestorage:一个用于海量小文件的云原生存储引擎

Cloud Native，这个新兴的计算基础设施已经成为云计算的新趋势，特别是在docker和LXD等容器化技术以及Kubernetes和Swarm等为它们服务的编排系统发展之后。随着Cloud Native的日益普及，出现了以下问题:(1)大多数Cloud Native应用程序都是为了充分利用云平台而设计的，但它们的文件存储并没有完全优化以适应云平台。(ii)传统的文件系统被设计为存储和检索文件的实用程序，通常内置在操作系统的内核中。但是，当将其放置在大规模的条件下，例如由数千个计算实例共享的网络存储服务器，并存储数百万个文件时，它将变得缓慢甚至不稳定。(iii)大多数存储方案使用元数据来更快地跟踪文件，但元数据本身会占用大量空间，而且容量通常是有限的。如果文件系统将元数据直接存储到硬盘中而不进行缓存，那么跟踪大量小文件的速度将会慢得多。(iv)传统的对象存储解决方案不能提供足够的功能，使其在云上更加实用，如缓存和自动复制。本文在著名的Haystack存储引擎的基础上，提出了一种新的存储引擎，在服务发现和自动化容错方面进行了优化，使其更适合云原生基础设施、部署和应用。我们使用对象存储模型来解决海量、高频的文件存储需求，提供一套简单统一的api供应用程序访问。我们还利用Kubernetes复杂和自动化的工具链，使云存储更容易部署，更灵活地扩展，更稳定地运行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC)

自引率

0.00%

发文量