pet之间的隐私效用权衡：差异隐私和合成数据

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

IEEE Transactions on Computational Social Systems Pub Date : 2024-11-14 DOI:10.1109/TCSS.2024.3479317

Qaiser Razi;Sujoya Datta;Vikas Hassija;GSS Chalapathi;Biplab Sikdar

{"title":"pet之间的隐私效用权衡：差异隐私和合成数据","authors":"Qaiser Razi;Sujoya Datta;Vikas Hassija;GSS Chalapathi;Biplab Sikdar","doi":"10.1109/TCSS.2024.3479317","DOIUrl":null,"url":null,"abstract":"Data privacy is a critical concern in the digital age. This problem has compounded with the evolution and increased adoption of machine learning (ML), which has necessitated balancing the security of sensitive information with model utility. Traditional data privacy techniques, such as differential privacy and anonymization, focus on protecting data at rest and in transit but often fail to maintain high utility for machine learning models due to their impact on data accuracy. In this article, we explore the use of synthetic data as a privacy-preserving method that can effectively balance data privacy and utility. Synthetic data is generated to replicate the statistical properties of the original dataset while obscuring identifying details, offering enhanced privacy guarantees. We evaluate the performance of synthetic data against differentially private and anonymized data in terms of prediction accuracy across various settings—different learning rates, network architectures, and datasets from various domains. Our findings demonstrate that synthetic data maintains higher utility (prediction accuracy) than differentially private and anonymized data. The study underscores the potential of synthetic data as a robust privacy-enhancing technology (PET) capable of preserving both privacy and data utility in machine learning environments.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"473-484"},"PeriodicalIF":4.5000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy Utility Tradeoff Between PETs: Differential Privacy and Synthetic Data\",\"authors\":\"Qaiser Razi;Sujoya Datta;Vikas Hassija;GSS Chalapathi;Biplab Sikdar\",\"doi\":\"10.1109/TCSS.2024.3479317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data privacy is a critical concern in the digital age. This problem has compounded with the evolution and increased adoption of machine learning (ML), which has necessitated balancing the security of sensitive information with model utility. Traditional data privacy techniques, such as differential privacy and anonymization, focus on protecting data at rest and in transit but often fail to maintain high utility for machine learning models due to their impact on data accuracy. In this article, we explore the use of synthetic data as a privacy-preserving method that can effectively balance data privacy and utility. Synthetic data is generated to replicate the statistical properties of the original dataset while obscuring identifying details, offering enhanced privacy guarantees. We evaluate the performance of synthetic data against differentially private and anonymized data in terms of prediction accuracy across various settings—different learning rates, network architectures, and datasets from various domains. Our findings demonstrate that synthetic data maintains higher utility (prediction accuracy) than differentially private and anonymized data. The study underscores the potential of synthetic data as a robust privacy-enhancing technology (PET) capable of preserving both privacy and data utility in machine learning environments.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"12 2\",\"pages\":\"473-484\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10753017/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753017/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

摘要

数据隐私是数字时代的一个重要问题。随着机器学习（ML）的发展和采用的增加，这一问题变得更加复杂，这就需要在敏感信息的安全性和模型的实用性之间取得平衡。传统的数据隐私技术（如差分隐私和匿名化）侧重于保护静态和传输中的数据，但由于其对数据准确性的影响，往往无法保持机器学习模型的高实用性。在本文中，我们将探索使用合成数据作为一种隐私保护方法，以有效平衡数据隐私和实用性。生成合成数据是为了复制原始数据集的统计属性，同时掩盖识别细节，从而提供更强的隐私保证。我们评估了合成数据与不同隐私数据和匿名数据在预测准确性方面的性能，包括不同的学习率、网络架构和来自不同领域的数据集。我们的研究结果表明，与不同的私有数据和匿名数据相比，合成数据能保持更高的效用（预测准确率）。这项研究强调了合成数据作为一种强大的隐私增强技术（PET）的潜力，它能够在机器学习环境中同时保护隐私和数据效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Privacy Utility Tradeoff Between PETs: Differential Privacy and Synthetic Data

Data privacy is a critical concern in the digital age. This problem has compounded with the evolution and increased adoption of machine learning (ML), which has necessitated balancing the security of sensitive information with model utility. Traditional data privacy techniques, such as differential privacy and anonymization, focus on protecting data at rest and in transit but often fail to maintain high utility for machine learning models due to their impact on data accuracy. In this article, we explore the use of synthetic data as a privacy-preserving method that can effectively balance data privacy and utility. Synthetic data is generated to replicate the statistical properties of the original dataset while obscuring identifying details, offering enhanced privacy guarantees. We evaluate the performance of synthetic data against differentially private and anonymized data in terms of prediction accuracy across various settings—different learning rates, network architectures, and datasets from various domains. Our findings demonstrate that synthetic data maintains higher utility (prediction accuracy) than differentially private and anonymized data. The study underscores the potential of synthetic data as a robust privacy-enhancing technology (PET) capable of preserving both privacy and data utility in machine learning environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.