{"title":"Improving Storage Efficiency for Raw Image Photo Repository by Exploiting Similarity","authors":"Binqi Zhang, Chen Wang, B. Zhou, Albert Y. Zomaya","doi":"10.1109/PDCAT.2016.045","DOIUrl":null,"url":null,"abstract":"Exploiting temporal and spatial locality is a way to improve the performance of data compression and deduplication in a storage system. Through our evaluation, we find that content level similarity measures such as similar tags of photos have a certain correlation to data compressibility. Raw images with similar tags can be compressed together to get better storage space savings. Furthermore, storing similar raw images together enables rapid data sorting, searching, and retrieval if the images are stored in a distributed and large-scale environment with reduced fragmentation. In this paper, we present the correlation results between content similarity and data compressibility using a dataset built from Flickr. The system design we proposed has been based on the evaluation and it optimizes storage efficiency for Top-N relevant images with the same tag. On one hand, the storage space is saved. On the other hand, the design may accelerate the query performance for Top-N relevance search.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2016.045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Exploiting temporal and spatial locality is a way to improve the performance of data compression and deduplication in a storage system. Through our evaluation, we find that content level similarity measures such as similar tags of photos have a certain correlation to data compressibility. Raw images with similar tags can be compressed together to get better storage space savings. Furthermore, storing similar raw images together enables rapid data sorting, searching, and retrieval if the images are stored in a distributed and large-scale environment with reduced fragmentation. In this paper, we present the correlation results between content similarity and data compressibility using a dataset built from Flickr. The system design we proposed has been based on the evaluation and it optimizes storage efficiency for Top-N relevant images with the same tag. On one hand, the storage space is saved. On the other hand, the design may accelerate the query performance for Top-N relevance search.