使用Hadoop对大型可搜索数据集进行初始加密

Proceedings of the 20th ACM Symposium on Access Control Models and Technologies Pub Date : 2015-06-01 DOI:10.1145/2752952.2752960

Feng Wang, Mathias Kohler, A. Schaad

{"title":"使用Hadoop对大型可搜索数据集进行初始加密","authors":"Feng Wang, Mathias Kohler, A. Schaad","doi":"10.1145/2752952.2752960","DOIUrl":null,"url":null,"abstract":"With the introduction and the widely use of external hosted infrastructures, secure storage of sensitive data becomes more and more important. There are systems available to store and query encrypted data in a database, but not all applications may start with empty tables rather than having sets of legacy data. Hence, there is a need to transform existing plaintext databases to encrypted form. Usually existing enterprise databases may contain terabytes of data. A single machine would require many months for the initial encryption of a large data set. We propose encrypting data in parallel using a Hadoop cluster which is a simple five step process including the Hadoop set up, target preparation, source data import, encrypting the data, and finally exporting it to the target. We evaluated our solution on real world data and report on performance and data consumption. The results show that encrypting data in parallel can be done in a very scalable manner. Using a parallelized encryption cluster compared to a single server machine reduces the encryption time from months down to days or even hours.","PeriodicalId":305802,"journal":{"name":"Proceedings of the 20th ACM Symposium on Access Control Models and Technologies","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Initial Encryption of large Searchable Data Sets using Hadoop\",\"authors\":\"Feng Wang, Mathias Kohler, A. Schaad\",\"doi\":\"10.1145/2752952.2752960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the introduction and the widely use of external hosted infrastructures, secure storage of sensitive data becomes more and more important. There are systems available to store and query encrypted data in a database, but not all applications may start with empty tables rather than having sets of legacy data. Hence, there is a need to transform existing plaintext databases to encrypted form. Usually existing enterprise databases may contain terabytes of data. A single machine would require many months for the initial encryption of a large data set. We propose encrypting data in parallel using a Hadoop cluster which is a simple five step process including the Hadoop set up, target preparation, source data import, encrypting the data, and finally exporting it to the target. We evaluated our solution on real world data and report on performance and data consumption. The results show that encrypting data in parallel can be done in a very scalable manner. Using a parallelized encryption cluster compared to a single server machine reduces the encryption time from months down to days or even hours.\",\"PeriodicalId\":305802,\"journal\":{\"name\":\"Proceedings of the 20th ACM Symposium on Access Control Models and Technologies\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM Symposium on Access Control Models and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2752952.2752960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM Symposium on Access Control Models and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2752952.2752960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

随着外部托管基础设施的引入和广泛使用，敏感数据的安全存储变得越来越重要。有一些系统可用于在数据库中存储和查询加密数据，但并非所有应用程序都可以从空表开始，而不是从遗留数据集开始。因此，需要将现有的明文数据库转换为加密形式。通常，现有的企业数据库可能包含数tb的数据。一台机器对一个大数据集进行初始加密需要好几个月的时间。我们建议使用Hadoop集群并行加密数据，这是一个简单的五步过程，包括Hadoop设置，目标准备，源数据导入，数据加密，最后导出到目标。我们根据真实世界的数据评估了我们的解决方案，并报告了性能和数据消耗情况。结果表明，并行数据加密可以以一种非常可扩展的方式完成。与使用单个服务器机器相比，使用并行加密集群可以将加密时间从几个月减少到几天甚至几个小时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Initial Encryption of large Searchable Data Sets using Hadoop

With the introduction and the widely use of external hosted infrastructures, secure storage of sensitive data becomes more and more important. There are systems available to store and query encrypted data in a database, but not all applications may start with empty tables rather than having sets of legacy data. Hence, there is a need to transform existing plaintext databases to encrypted form. Usually existing enterprise databases may contain terabytes of data. A single machine would require many months for the initial encryption of a large data set. We propose encrypting data in parallel using a Hadoop cluster which is a simple five step process including the Hadoop set up, target preparation, source data import, encrypting the data, and finally exporting it to the target. We evaluated our solution on real world data and report on performance and data consumption. The results show that encrypting data in parallel can be done in a very scalable manner. Using a parallelized encryption cluster compared to a single server machine reduces the encryption time from months down to days or even hours.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM Symposium on Access Control Models and Technologies

自引率

0.00%

发文量