Protecting User Privacy: Obfuscating Discriminative Spatio-Temporal Footprints

Proceedings of the 1st ACM SIGSPATIAL Workshop on Recommendations for Location-based Services and Social Networks Pub Date : 2017-11-07 DOI:10.1145/3148150.3148152

Jinhyung D. Park, E. Seglem, Eric Lin, Andreas Züfle

{"title":"Protecting User Privacy: Obfuscating Discriminative Spatio-Temporal Footprints","authors":"Jinhyung D. Park, E. Seglem, Eric Lin, Andreas Züfle","doi":"10.1145/3148150.3148152","DOIUrl":null,"url":null,"abstract":"In recent years, applications that collect and store location data have become ubiquitous, allowing users to engage in a variety of interactions with other users and services in their digital or physical vicinity. However, usage of these geolocation services put users at risk of serious privacy threats. For instance, state-of-the-art user-identification methods use geospatial trajectories derived from location based services to identify users at an alarmingly high accuracy. In this work, we address the problem of protecting user identities by presenting methods for obfuscating discriminative location data in users' profiles. We utilize data provided by the public Twitter API, collecting tweets with geolocation tags from a select group of prolific users in a 12-week time period. To minimize the amount of data obfuscated, we present two methods to identify the most discriminative tweets. The first solution is to use an Entropy-Maximizing Observation Function based on the number of tweets the user has posted and the number of people who have posted in that specific location. This ensures tweets by infrequent users in unique locations are changed first. The other solution is to use the identification algorithm to figure out what users can be identified and only change tweets from those users. For both methods, to perturb a tweet, we move it to a location with more tweets to mask the identity of the user. A thorough experimentation of other baseline approaches shows that our model exhibits a significant decrease in user identification accuracy while keeping the percentage of changed data at a minimum.","PeriodicalId":176579,"journal":{"name":"Proceedings of the 1st ACM SIGSPATIAL Workshop on Recommendations for Location-based Services and Social Networks","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st ACM SIGSPATIAL Workshop on Recommendations for Location-based Services and Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3148150.3148152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In recent years, applications that collect and store location data have become ubiquitous, allowing users to engage in a variety of interactions with other users and services in their digital or physical vicinity. However, usage of these geolocation services put users at risk of serious privacy threats. For instance, state-of-the-art user-identification methods use geospatial trajectories derived from location based services to identify users at an alarmingly high accuracy. In this work, we address the problem of protecting user identities by presenting methods for obfuscating discriminative location data in users' profiles. We utilize data provided by the public Twitter API, collecting tweets with geolocation tags from a select group of prolific users in a 12-week time period. To minimize the amount of data obfuscated, we present two methods to identify the most discriminative tweets. The first solution is to use an Entropy-Maximizing Observation Function based on the number of tweets the user has posted and the number of people who have posted in that specific location. This ensures tweets by infrequent users in unique locations are changed first. The other solution is to use the identification algorithm to figure out what users can be identified and only change tweets from those users. For both methods, to perturb a tweet, we move it to a location with more tweets to mask the identity of the user. A thorough experimentation of other baseline approaches shows that our model exhibits a significant decrease in user identification accuracy while keeping the percentage of changed data at a minimum.

查看原文本刊更多论文

保护用户隐私:混淆区分时空足迹

近年来，收集和存储位置数据的应用程序变得无处不在，允许用户与他们的数字或物理附近的其他用户和服务进行各种交互。然而，使用这些地理定位服务会让用户面临严重的隐私威胁。例如，最先进的用户识别方法使用来自基于位置的服务的地理空间轨迹来以惊人的高准确性识别用户。在这项工作中，我们通过提出混淆用户配置文件中区分位置数据的方法来解决保护用户身份的问题。我们利用公共Twitter API提供的数据，在12周的时间内从一组高产用户中收集带有地理位置标签的推文。为了最大限度地减少数据混淆，我们提出了两种方法来识别最具歧视性的推文。第一个解决方案是使用基于用户发布的tweet数量和在该特定位置发布的人数的熵最大化观察函数。这确保了不经常使用的用户在独特位置发出的推文首先被更改。另一种解决方案是使用识别算法找出哪些用户可以被识别，并且只更改来自这些用户的tweet。对于这两种方法，为了干扰tweet，我们将其移动到具有更多tweet的位置，以掩盖用户的身份。对其他基线方法的彻底实验表明，我们的模型在用户识别准确性方面显着降低，同时将更改数据的百分比保持在最低水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st ACM SIGSPATIAL Workshop on Recommendations for Location-based Services and Social Networks

自引率

0.00%

发文量