Passent M. ElKafrawy, Amr M. Sauber, Mohamed M. Hafez
{"title":"HDFSX: Big data Distributed File System with small files support","authors":"Passent M. ElKafrawy, Amr M. Sauber, Mohamed M. Hafez","doi":"10.1109/ICENCO.2016.7856457","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System (HDFS) is a file system designed to handle large files - which are in gigabytes or terabytes size - with streaming data access patterns, running clusters on commodity hardware. However, big data may exist in a huge number of small files such as: in biology, astronomy or some applications generating 30 million files with an average size of 190 Kbytes. Unfortunately, HDFS wouldn't be able to handle such kind of fractured big data because single Namenode is considered a bottleneck when handling large number of small files. In this paper, we present a new structure for HDFS (HDFSX) to avoid higher memory usage, flooding network, requests overhead and centralized point of failure (single point of failure “SPOF”) of the single Namenode.","PeriodicalId":332360,"journal":{"name":"2016 12th International Computer Engineering Conference (ICENCO)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Computer Engineering Conference (ICENCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICENCO.2016.7856457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Hadoop Distributed File System (HDFS) is a file system designed to handle large files - which are in gigabytes or terabytes size - with streaming data access patterns, running clusters on commodity hardware. However, big data may exist in a huge number of small files such as: in biology, astronomy or some applications generating 30 million files with an average size of 190 Kbytes. Unfortunately, HDFS wouldn't be able to handle such kind of fractured big data because single Namenode is considered a bottleneck when handling large number of small files. In this paper, we present a new structure for HDFS (HDFSX) to avoid higher memory usage, flooding network, requests overhead and centralized point of failure (single point of failure “SPOF”) of the single Namenode.