SD-HDFS: Secure Deletion in Hadoop Distributed File System

B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski
{"title":"SD-HDFS: Secure Deletion in Hadoop Distributed File System","authors":"B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski","doi":"10.1109/BigDataCongress.2016.30","DOIUrl":null,"url":null,"abstract":"Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SD-HDFS: Secure Deletion in Hadoop Distributed File System\",\"authors\":\"B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski\",\"doi\":\"10.1109/BigDataCongress.2016.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.\",\"PeriodicalId\":407471,\"journal\":{\"name\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2016.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

存储在Hadoop集群中的敏感信息可能在未经许可或授予访问权限的情况下被检索。此外,从Hadoop集群中恢复已删除数据的能力是一个主要的安全威胁。Hadoop集群用于管理组织内外的大量数据。因此,能够有效地定位和删除数据变得非常重要。在本文中,我们提出了安全删除,这是一个整体框架,通过辅助通信路径将文件信息传播到块管理层。框架跟踪未删除的数据块,并修改HDFS的正常删除操作。我们介绍CheckerNode,它从所有datanode生成汇总报告,并将块信息与NameNode的元数据进行比较。如果元数据中不包含该数据块的表项,则会自动删除未同步的数据块。然而,删除的数据仍然可以使用数字取证工具恢复。我们还描述了HDFS中的一种新的安全删除技术,该技术生成随机模式并多次写入数据块的磁盘位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SD-HDFS: Secure Deletion in Hadoop Distributed File System
Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信