基于局部差分隐私的多类型关联约束流传感器数据采集

IF 4.7 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Sensor Networks Pub Date : 2023-09-13 DOI:10.1145/3623637

Yue Fu, Qingqing Ye, Rong Du, Haibo Hu

{"title":"基于局部差分隐私的多类型关联约束流传感器数据采集","authors":"Yue Fu, Qingqing Ye, Rong Du, Haibo Hu","doi":"10.1145/3623637","DOIUrl":null,"url":null,"abstract":"Local differential privacy (LDP) is a promising privacy model for distributed data collection. It has been widely deployed in real-world systems (e.g. Chrome, iOS, macOS). In LDP-based mechanisms, an aggregator collects private values perturbed by each user and then analyses these values to estimate their statistics, such as frequency and mean. Most existing works focus on simple scalar value types, such as boolean and categorical values. However, with the emergence of smart sensors and Internet of Things, high-dimensional data are gaining increasing popularity. In many cases where more than one type of sensor data are collected simultaneously, correlations exist between various attributes of such data, e.g. temperature and luminance. To ensure LDP for high-dimensional data, existing solutions either partition the privacy budget ϵ among these correlated attributes or adopt sampling, both of which dilute the density of useful information and thus result in poor data utility. In this paper, we propose a relaxed LDP model, namely, univariate dominance local differential privacy (UDLDP), for high-dimensional data. We quantify the correlations between attributes and present a correlation-bounded perturbation (CBP) mechanism that optimizes the partitioning of privacy budget on each correlated attribute. Furthermore, we extend CBP to support sampling, which is a common bandwidth reduction technique in sensor networks and Internet of Things. We derive the best allocation strategy of sampling probabilities among attributes in terms of data utility, which leads to the correlation-bounded perturbation mechanism with sampling (CBPS). Finally, we discuss how to collect and leverage the correlation from real-time data stream with a by-round algorithm to enhance the utility. The performance of the proposed mechanisms is evaluated and compared with state-of-the-art LDP mechanisms on real-world and synthetic datasets.","PeriodicalId":50910,"journal":{"name":"ACM Transactions on Sensor Networks","volume":"2 1","pages":"0"},"PeriodicalIF":4.7000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Collecting Multi-type and Correlation-Constrained Streaming Sensor Data with Local Differential Privacy\",\"authors\":\"Yue Fu, Qingqing Ye, Rong Du, Haibo Hu\",\"doi\":\"10.1145/3623637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Local differential privacy (LDP) is a promising privacy model for distributed data collection. It has been widely deployed in real-world systems (e.g. Chrome, iOS, macOS). In LDP-based mechanisms, an aggregator collects private values perturbed by each user and then analyses these values to estimate their statistics, such as frequency and mean. Most existing works focus on simple scalar value types, such as boolean and categorical values. However, with the emergence of smart sensors and Internet of Things, high-dimensional data are gaining increasing popularity. In many cases where more than one type of sensor data are collected simultaneously, correlations exist between various attributes of such data, e.g. temperature and luminance. To ensure LDP for high-dimensional data, existing solutions either partition the privacy budget ϵ among these correlated attributes or adopt sampling, both of which dilute the density of useful information and thus result in poor data utility. In this paper, we propose a relaxed LDP model, namely, univariate dominance local differential privacy (UDLDP), for high-dimensional data. We quantify the correlations between attributes and present a correlation-bounded perturbation (CBP) mechanism that optimizes the partitioning of privacy budget on each correlated attribute. Furthermore, we extend CBP to support sampling, which is a common bandwidth reduction technique in sensor networks and Internet of Things. We derive the best allocation strategy of sampling probabilities among attributes in terms of data utility, which leads to the correlation-bounded perturbation mechanism with sampling (CBPS). Finally, we discuss how to collect and leverage the correlation from real-time data stream with a by-round algorithm to enhance the utility. The performance of the proposed mechanisms is evaluated and compared with state-of-the-art LDP mechanisms on real-world and synthetic datasets.\",\"PeriodicalId\":50910,\"journal\":{\"name\":\"ACM Transactions on Sensor Networks\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2023-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Sensor Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3623637\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Sensor Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3623637","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本地差分隐私(LDP)是一种很有前途的分布式数据收集隐私模型。它已被广泛部署在现实世界的系统(如Chrome, iOS, macOS)。在基于ldp的机制中，聚合器收集每个用户干扰的私有值，然后分析这些值以估计其统计数据，例如频率和平均值。大多数现有的工作集中在简单的标量值类型，如布尔值和分类值。然而，随着智能传感器和物联网的出现，高维数据越来越受欢迎。在许多同时收集一种以上传感器数据的情况下，这些数据的各种属性之间存在相关性，例如温度和亮度。为了确保高维数据的LDP，现有的解决方案要么在这些相关属性之间划分隐私预算λ，要么采用抽样，这两种方法都会稀释有用信息的密度，从而导致数据实用性差。本文针对高维数据，提出了一种松弛的LDP模型，即单变量优势局部差分隐私(UDLDP)。我们量化了属性之间的相关性，并提出了一种相关有界扰动(CBP)机制，该机制优化了每个相关属性上隐私预算的划分。此外，我们扩展了CBP以支持采样，这是传感器网络和物联网中常见的带宽减少技术。从数据效用的角度出发，给出了采样概率在属性间的最佳分配策略，从而引入了相关有界采样扰动机制(CBPS)。最后，我们讨论了如何使用逐轮算法从实时数据流中收集和利用相关性来增强实用性。对所提出的机制的性能进行了评估，并与现实世界和合成数据集上最先进的LDP机制进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Collecting Multi-type and Correlation-Constrained Streaming Sensor Data with Local Differential Privacy

Local differential privacy (LDP) is a promising privacy model for distributed data collection. It has been widely deployed in real-world systems (e.g. Chrome, iOS, macOS). In LDP-based mechanisms, an aggregator collects private values perturbed by each user and then analyses these values to estimate their statistics, such as frequency and mean. Most existing works focus on simple scalar value types, such as boolean and categorical values. However, with the emergence of smart sensors and Internet of Things, high-dimensional data are gaining increasing popularity. In many cases where more than one type of sensor data are collected simultaneously, correlations exist between various attributes of such data, e.g. temperature and luminance. To ensure LDP for high-dimensional data, existing solutions either partition the privacy budget ϵ among these correlated attributes or adopt sampling, both of which dilute the density of useful information and thus result in poor data utility. In this paper, we propose a relaxed LDP model, namely, univariate dominance local differential privacy (UDLDP), for high-dimensional data. We quantify the correlations between attributes and present a correlation-bounded perturbation (CBP) mechanism that optimizes the partitioning of privacy budget on each correlated attribute. Furthermore, we extend CBP to support sampling, which is a common bandwidth reduction technique in sensor networks and Internet of Things. We derive the best allocation strategy of sampling probabilities among attributes in terms of data utility, which leads to the correlation-bounded perturbation mechanism with sampling (CBPS). Finally, we discuss how to collect and leverage the correlation from real-time data stream with a by-round algorithm to enhance the utility. The performance of the proposed mechanisms is evaluated and compared with state-of-the-art LDP mechanisms on real-world and synthetic datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Sensor Networks 工程技术-电信学

CiteScore

5.90

自引率

7.30%

发文量

131

审稿时长

6 months

期刊介绍： ACM Transactions on Sensor Networks (TOSN) is a central publication by the ACM in the interdisciplinary area of sensor networks spanning a broad discipline from signal processing, networking and protocols, embedded systems, information management, to distributed algorithms. It covers research contributions that introduce new concepts, techniques, analyses, or architectures, as well as applied contributions that report on development of new tools and systems or experiences and experiments with high-impact, innovative applications. The Transactions places special attention on contributions to systemic approaches to sensor networks as well as fundamental contributions.