{"title":"DeepHash","authors":"Yuanning Gao, Xiaofeng Gao, Guihai Chen","doi":"10.1145/3337821.3337924","DOIUrl":null,"url":null,"abstract":"In distributed file systems, distributed metadata management can be considered as a mapping problem, i.e., how to effectively map the metadata namespace tree to multiple metadata servers (MDS's). In general, all traditional distributed metadata management schemes simply presume a rigid mapping function, thus failing to adaptively meet the requirements of different applications. To better take advantage of the current distribution of the metadata, in this exploratory paper, we present the first machine learning based model called DeepHash, which leverages the deep neural network to learn a locality preserving hashing (LPH) mapping. To help learn a good position relationship of metadata nodes in the namespace tree, we first present a metadata representation strategy. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design two kinds of loss functions with distinctive characters to train DeepHash respectively, including a pair loss and a triplet loss, and introduce some sampling strategies for these two approaches. We conduct extensive experiments on Amazon EC2 platform to compare the performance of DeepHash with traditional and state-of-the-art schemes. The results demonstrate that DeepHash can preserve the metadata locality well while maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"DeepHash\",\"authors\":\"Yuanning Gao, Xiaofeng Gao, Guihai Chen\",\"doi\":\"10.1145/3337821.3337924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In distributed file systems, distributed metadata management can be considered as a mapping problem, i.e., how to effectively map the metadata namespace tree to multiple metadata servers (MDS's). In general, all traditional distributed metadata management schemes simply presume a rigid mapping function, thus failing to adaptively meet the requirements of different applications. To better take advantage of the current distribution of the metadata, in this exploratory paper, we present the first machine learning based model called DeepHash, which leverages the deep neural network to learn a locality preserving hashing (LPH) mapping. To help learn a good position relationship of metadata nodes in the namespace tree, we first present a metadata representation strategy. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design two kinds of loss functions with distinctive characters to train DeepHash respectively, including a pair loss and a triplet loss, and introduce some sampling strategies for these two approaches. We conduct extensive experiments on Amazon EC2 platform to compare the performance of DeepHash with traditional and state-of-the-art schemes. The results demonstrate that DeepHash can preserve the metadata locality well while maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash.\",\"PeriodicalId\":405273,\"journal\":{\"name\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 48th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3337821.3337924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In distributed file systems, distributed metadata management can be considered as a mapping problem, i.e., how to effectively map the metadata namespace tree to multiple metadata servers (MDS's). In general, all traditional distributed metadata management schemes simply presume a rigid mapping function, thus failing to adaptively meet the requirements of different applications. To better take advantage of the current distribution of the metadata, in this exploratory paper, we present the first machine learning based model called DeepHash, which leverages the deep neural network to learn a locality preserving hashing (LPH) mapping. To help learn a good position relationship of metadata nodes in the namespace tree, we first present a metadata representation strategy. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design two kinds of loss functions with distinctive characters to train DeepHash respectively, including a pair loss and a triplet loss, and introduce some sampling strategies for these two approaches. We conduct extensive experiments on Amazon EC2 platform to compare the performance of DeepHash with traditional and state-of-the-art schemes. The results demonstrate that DeepHash can preserve the metadata locality well while maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash.