DeepHash

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337924

Yuanning Gao, Xiaofeng Gao, Guihai Chen

{"title":"DeepHash","authors":"Yuanning Gao, Xiaofeng Gao, Guihai Chen","doi":"10.1145/3337821.3337924","DOIUrl":null,"url":null,"abstract":"In distributed file systems, distributed metadata management can be considered as a mapping problem, i.e., how to effectively map the metadata namespace tree to multiple metadata servers (MDS's). In general, all traditional distributed metadata management schemes simply presume a rigid mapping function, thus failing to adaptively meet the requirements of different applications. To better take advantage of the current distribution of the metadata, in this exploratory paper, we present the first machine learning based model called DeepHash, which leverages the deep neural network to learn a locality preserving hashing (LPH) mapping. To help learn a good position relationship of metadata nodes in the namespace tree, we first present a metadata representation strategy. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design two kinds of loss functions with distinctive characters to train DeepHash respectively, including a pair loss and a triplet loss, and introduce some sampling strategies for these two approaches. We conduct extensive experiments on Amazon EC2 platform to compare the performance of DeepHash with traditional and state-of-the-art schemes. The results demonstrate that DeepHash can preserve the metadata locality well while maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"DeepHash\",\"authors\":\"Yuanning Gao, Xiaofeng Gao, Guihai Chen\",\"doi\":\"10.1145/3337821.3337924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In distributed file systems, distributed metadata management can be considered as a mapping problem, i.e., how to effectively map the metadata namespace tree to multiple metadata servers (MDS's). In general, all traditional distributed metadata management schemes simply presume a rigid mapping function, thus failing to adaptively meet the requirements of different applications. To better take advantage of the current distribution of the metadata, in this exploratory paper, we present the first machine learning based model called DeepHash, which leverages the deep neural network to learn a locality preserving hashing (LPH) mapping. To help learn a good position relationship of metadata nodes in the namespace tree, we first present a metadata representation strategy. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design two kinds of loss functions with distinctive characters to train DeepHash respectively, including a pair loss and a triplet loss, and introduce some sampling strategies for these two approaches. We conduct extensive experiments on Amazon EC2 platform to compare the performance of DeepHash with traditional and state-of-the-art schemes. The results demonstrate that DeepHash can preserve the metadata locality well while maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash.\",\"PeriodicalId\":405273,\"journal\":{\"name\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3337821.3337924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeepHash

In distributed file systems, distributed metadata management can be considered as a mapping problem, i.e., how to effectively map the metadata namespace tree to multiple metadata servers (MDS's). In general, all traditional distributed metadata management schemes simply presume a rigid mapping function, thus failing to adaptively meet the requirements of different applications. To better take advantage of the current distribution of the metadata, in this exploratory paper, we present the first machine learning based model called DeepHash, which leverages the deep neural network to learn a locality preserving hashing (LPH) mapping. To help learn a good position relationship of metadata nodes in the namespace tree, we first present a metadata representation strategy. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design two kinds of loss functions with distinctive characters to train DeepHash respectively, including a pair loss and a triplet loss, and introduce some sampling strategies for these two approaches. We conduct extensive experiments on Amazon EC2 platform to compare the performance of DeepHash with traditional and state-of-the-art schemes. The results demonstrate that DeepHash can preserve the metadata locality well while maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量