B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski
{"title":"SD-HDFS: Secure Deletion in Hadoop Distributed File System","authors":"B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski","doi":"10.1109/BigDataCongress.2016.30","DOIUrl":null,"url":null,"abstract":"Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SD-HDFS: Secure Deletion in Hadoop Distributed File System\",\"authors\":\"B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski\",\"doi\":\"10.1109/BigDataCongress.2016.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.\",\"PeriodicalId\":407471,\"journal\":{\"name\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2016.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SD-HDFS: Secure Deletion in Hadoop Distributed File System
Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.