{"title":"迈向可扩展的HDFS架构","authors":"Farag Azzedin","doi":"10.1109/CTS.2013.6567222","DOIUrl":null,"url":null,"abstract":"Cloud computing infrastructures allow corporations to reduce costs by outsourcing computations on-demand. One of the areas cloud computing is increasingly being utilized for is large scale data processing. Apache Hadoop is one of these large scale data processing projects that supports data-intensive distributed applications. Hadoop applications utilize a distributed file system for data storage called Hadoop Distributed File System (HDFS). HDFS architecture, by design, has only a single master node called ame ode, which manages and maintains the metadata of storage nodes, called Datanodes, in its RAM. Hence, HDFS Datanodes' metadata is restricted by the capacity of the RAM of the HDFS's single-point-of-failure ame ode. This paper proposes a fault tolerant, highly available and widely scalable HDFS architecture. The proposed architecture provides a distributed ame ode space eliminating the drawbacks of the current HDFS architecture. This is achieved by integrating the Chord protocol into the HDFS architecture.","PeriodicalId":256633,"journal":{"name":"2013 International Conference on Collaboration Technologies and Systems (CTS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Towards a scalable HDFS architecture\",\"authors\":\"Farag Azzedin\",\"doi\":\"10.1109/CTS.2013.6567222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud computing infrastructures allow corporations to reduce costs by outsourcing computations on-demand. One of the areas cloud computing is increasingly being utilized for is large scale data processing. Apache Hadoop is one of these large scale data processing projects that supports data-intensive distributed applications. Hadoop applications utilize a distributed file system for data storage called Hadoop Distributed File System (HDFS). HDFS architecture, by design, has only a single master node called ame ode, which manages and maintains the metadata of storage nodes, called Datanodes, in its RAM. Hence, HDFS Datanodes' metadata is restricted by the capacity of the RAM of the HDFS's single-point-of-failure ame ode. This paper proposes a fault tolerant, highly available and widely scalable HDFS architecture. The proposed architecture provides a distributed ame ode space eliminating the drawbacks of the current HDFS architecture. This is achieved by integrating the Chord protocol into the HDFS architecture.\",\"PeriodicalId\":256633,\"journal\":{\"name\":\"2013 International Conference on Collaboration Technologies and Systems (CTS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Collaboration Technologies and Systems (CTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CTS.2013.6567222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Collaboration Technologies and Systems (CTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTS.2013.6567222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cloud computing infrastructures allow corporations to reduce costs by outsourcing computations on-demand. One of the areas cloud computing is increasingly being utilized for is large scale data processing. Apache Hadoop is one of these large scale data processing projects that supports data-intensive distributed applications. Hadoop applications utilize a distributed file system for data storage called Hadoop Distributed File System (HDFS). HDFS architecture, by design, has only a single master node called ame ode, which manages and maintains the metadata of storage nodes, called Datanodes, in its RAM. Hence, HDFS Datanodes' metadata is restricted by the capacity of the RAM of the HDFS's single-point-of-failure ame ode. This paper proposes a fault tolerant, highly available and widely scalable HDFS architecture. The proposed architecture provides a distributed ame ode space eliminating the drawbacks of the current HDFS architecture. This is achieved by integrating the Chord protocol into the HDFS architecture.