{"title":"pet之间的隐私效用权衡:差异隐私和合成数据","authors":"Qaiser Razi;Sujoya Datta;Vikas Hassija;GSS Chalapathi;Biplab Sikdar","doi":"10.1109/TCSS.2024.3479317","DOIUrl":null,"url":null,"abstract":"Data privacy is a critical concern in the digital age. This problem has compounded with the evolution and increased adoption of machine learning (ML), which has necessitated balancing the security of sensitive information with model utility. Traditional data privacy techniques, such as differential privacy and anonymization, focus on protecting data at rest and in transit but often fail to maintain high utility for machine learning models due to their impact on data accuracy. In this article, we explore the use of synthetic data as a privacy-preserving method that can effectively balance data privacy and utility. Synthetic data is generated to replicate the statistical properties of the original dataset while obscuring identifying details, offering enhanced privacy guarantees. We evaluate the performance of synthetic data against differentially private and anonymized data in terms of prediction accuracy across various settings—different learning rates, network architectures, and datasets from various domains. Our findings demonstrate that synthetic data maintains higher utility (prediction accuracy) than differentially private and anonymized data. The study underscores the potential of synthetic data as a robust privacy-enhancing technology (PET) capable of preserving both privacy and data utility in machine learning environments.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"473-484"},"PeriodicalIF":4.5000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy Utility Tradeoff Between PETs: Differential Privacy and Synthetic Data\",\"authors\":\"Qaiser Razi;Sujoya Datta;Vikas Hassija;GSS Chalapathi;Biplab Sikdar\",\"doi\":\"10.1109/TCSS.2024.3479317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data privacy is a critical concern in the digital age. This problem has compounded with the evolution and increased adoption of machine learning (ML), which has necessitated balancing the security of sensitive information with model utility. Traditional data privacy techniques, such as differential privacy and anonymization, focus on protecting data at rest and in transit but often fail to maintain high utility for machine learning models due to their impact on data accuracy. In this article, we explore the use of synthetic data as a privacy-preserving method that can effectively balance data privacy and utility. Synthetic data is generated to replicate the statistical properties of the original dataset while obscuring identifying details, offering enhanced privacy guarantees. We evaluate the performance of synthetic data against differentially private and anonymized data in terms of prediction accuracy across various settings—different learning rates, network architectures, and datasets from various domains. Our findings demonstrate that synthetic data maintains higher utility (prediction accuracy) than differentially private and anonymized data. The study underscores the potential of synthetic data as a robust privacy-enhancing technology (PET) capable of preserving both privacy and data utility in machine learning environments.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"12 2\",\"pages\":\"473-484\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10753017/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753017/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
Privacy Utility Tradeoff Between PETs: Differential Privacy and Synthetic Data
Data privacy is a critical concern in the digital age. This problem has compounded with the evolution and increased adoption of machine learning (ML), which has necessitated balancing the security of sensitive information with model utility. Traditional data privacy techniques, such as differential privacy and anonymization, focus on protecting data at rest and in transit but often fail to maintain high utility for machine learning models due to their impact on data accuracy. In this article, we explore the use of synthetic data as a privacy-preserving method that can effectively balance data privacy and utility. Synthetic data is generated to replicate the statistical properties of the original dataset while obscuring identifying details, offering enhanced privacy guarantees. We evaluate the performance of synthetic data against differentially private and anonymized data in terms of prediction accuracy across various settings—different learning rates, network architectures, and datasets from various domains. Our findings demonstrate that synthetic data maintains higher utility (prediction accuracy) than differentially private and anonymized data. The study underscores the potential of synthetic data as a robust privacy-enhancing technology (PET) capable of preserving both privacy and data utility in machine learning environments.
期刊介绍:
IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.