{"title":"Preserving Privacy in Fine-Grained Data Distillation With Sparse Answers for Efficient Edge Computing","authors":"Ke Pan;Maoguo Gong;Kaiyuan Feng;Hui Li","doi":"10.1109/JIOT.2024.3508804","DOIUrl":null,"url":null,"abstract":"In the field of Internet of Things (IoT), data distillation has been thought of as a key method to condense the original real dataset into a tiny synthetic dataset with less training burden while maintaining as much data utility as possible for training deep learning models. However, the data synthesis process may remember some sensitive information about the original dataset, which may raise privacy concerns for data owners. To address this problem, we present a novel differential privacy (DP)-based data distillation algorithm. Specifically, in the data distillation phase, we first randomly pick a training model from the model pool in each epoch, and then build a fine-grained distribution matching to generate informative data for improving the task-oriented model performance. In the privacy preservation phase, we selectively perturb input features that are more important for model training based on the sparse vector technique to protect the sensitive information contained in the original dataset and reduce privacy costs. Extensive experiments across several real-world datasets demonstrate that our algorithm can achieve higher data utility and model accuracy than existing solutions.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 8","pages":"10058-10069"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10786879/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of Internet of Things (IoT), data distillation has been thought of as a key method to condense the original real dataset into a tiny synthetic dataset with less training burden while maintaining as much data utility as possible for training deep learning models. However, the data synthesis process may remember some sensitive information about the original dataset, which may raise privacy concerns for data owners. To address this problem, we present a novel differential privacy (DP)-based data distillation algorithm. Specifically, in the data distillation phase, we first randomly pick a training model from the model pool in each epoch, and then build a fine-grained distribution matching to generate informative data for improving the task-oriented model performance. In the privacy preservation phase, we selectively perturb input features that are more important for model training based on the sparse vector technique to protect the sensitive information contained in the original dataset and reduce privacy costs. Extensive experiments across several real-world datasets demonstrate that our algorithm can achieve higher data utility and model accuracy than existing solutions.
期刊介绍:
The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.