Jeffrey Pang, Phillip B. Gibbons, M. Kaminsky, S. Seshan, Haifeng Yu
{"title":"Defragmenting DHT-based Distributed File Systems","authors":"Jeffrey Pang, Phillip B. Gibbons, M. Kaminsky, S. Seshan, Haifeng Yu","doi":"10.1109/ICDCS.2007.97","DOIUrl":null,"url":null,"abstract":"Existing DHT-based file systems use consistent hashing to assign file blocks to random machines. As a result, a user task accessing an entire file or multiple files needs to retrieve blocks from many different machines. This paper demonstrates that significant availability and performance gains can be achieved if instead, users are able to retrieve all the data needed for a given task from only a few DHT nodes. We explore the design and implications of such a \"defragmented\" DHT-based distributed file system, called D2, that also maintains important DHT properties like storage load balance. We show using real-world file system traces that a simple key encoding scheme is sufficient to maintain good defragmentation for most user tasks. Using both simulation and an actual 1,000 node deployment, we show that D2 increases availability by over an order of magnitude and improves user-perceived latency by 30- 100% compared to a traditional design.","PeriodicalId":170317,"journal":{"name":"27th International Conference on Distributed Computing Systems (ICDCS '07)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"27th International Conference on Distributed Computing Systems (ICDCS '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2007.97","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Existing DHT-based file systems use consistent hashing to assign file blocks to random machines. As a result, a user task accessing an entire file or multiple files needs to retrieve blocks from many different machines. This paper demonstrates that significant availability and performance gains can be achieved if instead, users are able to retrieve all the data needed for a given task from only a few DHT nodes. We explore the design and implications of such a "defragmented" DHT-based distributed file system, called D2, that also maintains important DHT properties like storage load balance. We show using real-world file system traces that a simple key encoding scheme is sufficient to maintain good defragmentation for most user tasks. Using both simulation and an actual 1,000 node deployment, we show that D2 increases availability by over an order of magnitude and improves user-perceived latency by 30- 100% compared to a traditional design.