大数据中的匿名算法和方法综述

Q1 Decision Sciences

Annals of Data Science Pub Date : 2024-07-13 DOI:10.1007/s40745-024-00557-w

Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani

{"title":"大数据中的匿名算法和方法综述","authors":"Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani","doi":"10.1007/s40745-024-00557-w","DOIUrl":null,"url":null,"abstract":"<div><p>In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"253 - 279"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Review of Anonymization Algorithms and Methods in Big Data\",\"authors\":\"Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani\",\"doi\":\"10.1007/s40745-024-00557-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.</p></div>\",\"PeriodicalId\":36280,\"journal\":{\"name\":\"Annals of Data Science\",\"volume\":\"12 1\",\"pages\":\"253 - 279\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s40745-024-00557-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s40745-024-00557-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

摘要

在大数据时代，随着数据量的增加和复杂性的增加，如何在使用大数据的同时保护用户的隐私是主要的挑战。这项研究的目的是找到解决这一挑战的办法。在本研究中，我们研究了各种数据匿名化方法，包括差异隐私、高级加密和强访问控制。此外，还研究了这些方法的操作、优缺点和使用，将这些方法应用于大数据的挑战，以及可能的解决方案。研究结果表明，传统的数据匿名化方法缺乏可扩展性，导致隐私泄露和数据丢失。当面对大量数据时，这些方法可能无法完全处理数据。此外，这些方法可能对重新识别攻击、链接攻击和推理攻击无效。我们介绍了能够以最小的数据丢失提供改进的隐私的新兴方法。这些方法对于大数据具有可扩展性。最后，我们研究了未来的研究工作，并提出了有助于改进现有算法或开发新方法的重要问题，以更好地管理非结构化数据的复杂性和规模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Review of Anonymization Algorithms and Methods in Big Data

In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Data Science Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

6.50

自引率

0.00%

发文量

期刊介绍： Annals of Data Science (ADS) publishes cutting-edge research findings, experimental results and case studies of data science. Although Data Science is regarded as an interdisciplinary field of using mathematics, statistics, databases, data mining, high-performance computing, knowledge management and virtualization to discover knowledge from Big Data, it should have its own scientific contents, such as axioms, laws and rules, which are fundamentally important for experts in different fields to explore their own interests from Big Data. ADS encourages contributors to address such challenging problems at this exchange platform. At present, how to discover knowledge from heterogeneous data under Big Data environment needs to be addressed. ADS is a series of volumes edited by either the editorial office or guest editors. Guest editors will be responsible for call-for-papers and the review process for high-quality contributions in their volumes.