Aiwu Shi, Zhencai Tian, Ge Chen, Jiyong Min, Jun Wu
{"title":"Research and optimization of massive small file processing performance based on Ceph","authors":"Aiwu Shi, Zhencai Tian, Ge Chen, Jiyong Min, Jun Wu","doi":"10.1117/12.2682512","DOIUrl":null,"url":null,"abstract":"In today’s computing era, lots of files are generated from various areas due to the rapid development of technologies. Storing and processing massive, small files is one of the significant challenges for Ceph. Ceph is a scalable, reliable, high-performance storage solution widely used in cloud computing. However, for a large number of small files, Ceph has problems such as write amplification will cause performance bottlenecks. This paper proposes a novel technique Extended Small Files Processing Framework (ESFPF). Firstly, for efficient storage of files, the small files are merged after deduplication, which will effectively reduce the data blocks of Ceph to reduce load to achieve high-efficiency data processing operation. Secondly, a prefetching mechanism and file index is introduced to improve the efficiency of accessing small files. The experimental results indicate that the proposed approach can improve the efficiency of storing and accessing massive, small files on Ceph.","PeriodicalId":177416,"journal":{"name":"Conference on Electronic Information Engineering and Data Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Electronic Information Engineering and Data Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2682512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In today’s computing era, lots of files are generated from various areas due to the rapid development of technologies. Storing and processing massive, small files is one of the significant challenges for Ceph. Ceph is a scalable, reliable, high-performance storage solution widely used in cloud computing. However, for a large number of small files, Ceph has problems such as write amplification will cause performance bottlenecks. This paper proposes a novel technique Extended Small Files Processing Framework (ESFPF). Firstly, for efficient storage of files, the small files are merged after deduplication, which will effectively reduce the data blocks of Ceph to reduce load to achieve high-efficiency data processing operation. Secondly, a prefetching mechanism and file index is introduced to improve the efficiency of accessing small files. The experimental results indicate that the proposed approach can improve the efficiency of storing and accessing massive, small files on Ceph.