{"title":"分级布隆过滤器阵列(HBA):一种新颖的、可扩展的元数据管理系统,用于基于集群的大型存储","authors":"Yifeng Zhu, Hong Jiang, Jun Wang","doi":"10.1109/CLUSTR.2004.1392614","DOIUrl":null,"url":null,"abstract":"An efficient and distributed scheme for file mapping or file lookup scheme is critical in decentralizing metadata management within a group of metadata servers. This work presents a technique called HBA (hierarchical Bloom filter arrays) to map file names to the servers holding their metadata. Two levels of probabilistic arrays, i.e., Bloom filter arrays, with different accuracies are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Extensive trace-driven simulations have shown our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage\",\"authors\":\"Yifeng Zhu, Hong Jiang, Jun Wang\",\"doi\":\"10.1109/CLUSTR.2004.1392614\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An efficient and distributed scheme for file mapping or file lookup scheme is critical in decentralizing metadata management within a group of metadata servers. This work presents a technique called HBA (hierarchical Bloom filter arrays) to map file names to the servers holding their metadata. Two levels of probabilistic arrays, i.e., Bloom filter arrays, with different accuracies are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Extensive trace-driven simulations have shown our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters).\",\"PeriodicalId\":123512,\"journal\":{\"name\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2004.1392614\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392614","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage
An efficient and distributed scheme for file mapping or file lookup scheme is critical in decentralizing metadata management within a group of metadata servers. This work presents a technique called HBA (hierarchical Bloom filter arrays) to map file names to the servers holding their metadata. Two levels of probabilistic arrays, i.e., Bloom filter arrays, with different accuracies are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Extensive trace-driven simulations have shown our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters).