Sina Shaham, Gabriel Ghinita, Ritesh Ahuja, John Krumm, Cyrus Shahabi
{"title":"HTF: Homogeneous Tree Framework for Differentially-Private Release of Large Geospatial Datasets with Self-Tuning Structure Height.","authors":"Sina Shaham, Gabriel Ghinita, Ritesh Ahuja, John Krumm, Cyrus Shahabi","doi":"10.1145/3569087","DOIUrl":null,"url":null,"abstract":"<p><p>Mobile apps that use location data are pervasive, spanning domains such as transportation, urban planning and healthcare. Important use cases for location data rely on statistical queries, e.g., identifying hotspots where users work and travel. Such queries can be answered efficiently by building histograms. However, precise histograms can expose sensitive details about individual users. Differential privacy (DP) is a mature and widely-adopted protection model, but most approaches for DP-compliant histograms work in a data-independent fashion, leading to poor accuracy. The few proposed data-dependent techniques attempt to adjust histogram partitions based on dataset characteristics, but they do not perform well due to the addition of noise required to achieve DP. In addition, they use ad-hoc criteria to decide the depth of the partitioning. We identify <i>density homogeneity</i> as a main factor driving the accuracy of DP-compliant histograms, and we build a data structure that splits the space such that data density is homogeneous within each resulting partition. We propose a self-tuning approach to decide the depth of the partitioning structure that optimizes the use of privacy budget. Furthermore, we provide an optimization that scales the proposed split approach to large datasets while maintaining accuracy. We show through extensive experiments on large-scale real-world data that the proposed approach achieves superior accuracy compared to existing approaches.</p>","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":"1 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10881200/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Spatial Algorithms and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3569087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/20 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
Abstract
Mobile apps that use location data are pervasive, spanning domains such as transportation, urban planning and healthcare. Important use cases for location data rely on statistical queries, e.g., identifying hotspots where users work and travel. Such queries can be answered efficiently by building histograms. However, precise histograms can expose sensitive details about individual users. Differential privacy (DP) is a mature and widely-adopted protection model, but most approaches for DP-compliant histograms work in a data-independent fashion, leading to poor accuracy. The few proposed data-dependent techniques attempt to adjust histogram partitions based on dataset characteristics, but they do not perform well due to the addition of noise required to achieve DP. In addition, they use ad-hoc criteria to decide the depth of the partitioning. We identify density homogeneity as a main factor driving the accuracy of DP-compliant histograms, and we build a data structure that splits the space such that data density is homogeneous within each resulting partition. We propose a self-tuning approach to decide the depth of the partitioning structure that optimizes the use of privacy budget. Furthermore, we provide an optimization that scales the proposed split approach to large datasets while maintaining accuracy. We show through extensive experiments on large-scale real-world data that the proposed approach achieves superior accuracy compared to existing approaches.
期刊介绍:
ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes the highest quality papers on all aspects of spatial algorithms and systems and closely related disciplines. It has a multi-disciplinary perspective in that it spans a large number of areas where spatial data is manipulated or visualized (regardless of how it is specified - i.e., geometrically or textually) such as geography, geographic information systems (GIS), geospatial and spatiotemporal databases, spatial and metric indexing, location-based services, web-based spatial applications, geographic information retrieval (GIR), spatial reasoning and mining, security and privacy, as well as the related visual computing areas of computer graphics, computer vision, geometric modeling, and visualization where the spatial, geospatial, and spatiotemporal data is central.