N. Karpagam, G. K. Thrilokesh, J. Shanker, K. Harish, M. Raja
{"title":"使用Hadoop框架拆分读取数据节点上的冗余数据集","authors":"N. Karpagam, G. K. Thrilokesh, J. Shanker, K. Harish, M. Raja","doi":"10.1109/ICAMMAET.2017.8186711","DOIUrl":null,"url":null,"abstract":"The process of storing and developing of big data is done with the help of Hadoop which is an open-source framework in a distributive arena across groups of systems using plain scheduling models. According to this framework HDFS (Hadoop Distributed File System) replicates datasets into two additional data nodes by default to achieve availability during failure of any components. The read and write activities of the data nodes is done with the file system based on instruction given by name node. The reading of data collection from different data node is done completely in parallel for the different data block of one data. So that if any failure of one block it would get the other location of its replicated block and read data block which would take up some time for it. In this paper the data collection are read in two different orders on two different data nodes of same data block as such from top to the middle and bottom to the middle respectively. In case of failure in any data node the other half of the other data node is read. Which is then processed using map reduce technique for analysis.","PeriodicalId":425974,"journal":{"name":"2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Split reading of redundant datasets on datanodes using Hadoop framework\",\"authors\":\"N. Karpagam, G. K. Thrilokesh, J. Shanker, K. Harish, M. Raja\",\"doi\":\"10.1109/ICAMMAET.2017.8186711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The process of storing and developing of big data is done with the help of Hadoop which is an open-source framework in a distributive arena across groups of systems using plain scheduling models. According to this framework HDFS (Hadoop Distributed File System) replicates datasets into two additional data nodes by default to achieve availability during failure of any components. The read and write activities of the data nodes is done with the file system based on instruction given by name node. The reading of data collection from different data node is done completely in parallel for the different data block of one data. So that if any failure of one block it would get the other location of its replicated block and read data block which would take up some time for it. In this paper the data collection are read in two different orders on two different data nodes of same data block as such from top to the middle and bottom to the middle respectively. In case of failure in any data node the other half of the other data node is read. Which is then processed using map reduce technique for analysis.\",\"PeriodicalId\":425974,\"journal\":{\"name\":\"2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAMMAET.2017.8186711\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAMMAET.2017.8186711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Split reading of redundant datasets on datanodes using Hadoop framework
The process of storing and developing of big data is done with the help of Hadoop which is an open-source framework in a distributive arena across groups of systems using plain scheduling models. According to this framework HDFS (Hadoop Distributed File System) replicates datasets into two additional data nodes by default to achieve availability during failure of any components. The read and write activities of the data nodes is done with the file system based on instruction given by name node. The reading of data collection from different data node is done completely in parallel for the different data block of one data. So that if any failure of one block it would get the other location of its replicated block and read data block which would take up some time for it. In this paper the data collection are read in two different orders on two different data nodes of same data block as such from top to the middle and bottom to the middle respectively. In case of failure in any data node the other half of the other data node is read. Which is then processed using map reduce technique for analysis.