Where you go is who you are: a study on machine learning based semantic privacy attacks

IF 6.4 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data Pub Date : 2024-03-12 DOI:10.1186/s40537-024-00888-8

Nina Wiedemann, Krzysztof Janowicz, Martin Raubal, Ourania Kounadi

{"title":"Where you go is who you are: a study on machine learning based semantic privacy attacks","authors":"Nina Wiedemann, Krzysztof Janowicz, Martin Raubal, Ourania Kounadi","doi":"10.1186/s40537-024-00888-8","DOIUrl":null,"url":null,"abstract":"<p>Concerns about data privacy are omnipresent, given the increasing usage of digital applications and their underlying business model that includes selling user data. Location data is particularly sensitive since they allow us to infer activity patterns and interests of users, e.g., by categorizing visited locations based on nearby points of interest (POI). On top of that, machine learning methods provide new powerful tools to interpret big data. In light of these considerations, we raise the following question: What is the actual risk that realistic, machine learning based privacy attacks can obtain meaningful semantic information from raw location data, subject to inaccuracies in the data? In response, we present a systematic analysis of two attack scenarios, namely location categorization and user profiling. Experiments on the Foursquare dataset and tracking data demonstrate the potential for abuse of high-quality spatial information, leading to a significant privacy loss even with location inaccuracy of up to 200 m. With location obfuscation of more than 1 km, spatial information hardly adds any value, but a high privacy risk solely from temporal information remains. The availability of public context data such as POIs plays a key role in inference based on spatial information. Our findings point out the risks of ever-growing databases of tracking data and spatial context data, which policymakers should consider for privacy regulations, and which could guide individuals in their personal location protection measures.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"19 1","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00888-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Concerns about data privacy are omnipresent, given the increasing usage of digital applications and their underlying business model that includes selling user data. Location data is particularly sensitive since they allow us to infer activity patterns and interests of users, e.g., by categorizing visited locations based on nearby points of interest (POI). On top of that, machine learning methods provide new powerful tools to interpret big data. In light of these considerations, we raise the following question: What is the actual risk that realistic, machine learning based privacy attacks can obtain meaningful semantic information from raw location data, subject to inaccuracies in the data? In response, we present a systematic analysis of two attack scenarios, namely location categorization and user profiling. Experiments on the Foursquare dataset and tracking data demonstrate the potential for abuse of high-quality spatial information, leading to a significant privacy loss even with location inaccuracy of up to 200 m. With location obfuscation of more than 1 km, spatial information hardly adds any value, but a high privacy risk solely from temporal information remains. The availability of public context data such as POIs plays a key role in inference based on spatial information. Our findings point out the risks of ever-growing databases of tracking data and spatial context data, which policymakers should consider for privacy regulations, and which could guide individuals in their personal location protection measures.

Abstract Image

查看原文本刊更多论文

你去哪里，你就是谁：基于机器学习的语义隐私攻击研究

鉴于数字应用程序的使用日益增多，其基本商业模式包括出售用户数据，因此人们对数据隐私的担忧无处不在。位置数据尤为敏感，因为它们允许我们推断用户的活动模式和兴趣，例如，根据附近的兴趣点（POI）对访问过的地点进行分类。此外，机器学习方法为解释大数据提供了新的强大工具。鉴于上述考虑，我们提出了以下问题：在数据不准确的情况下，基于机器学习的现实隐私攻击能够从原始位置数据中获取有意义的语义信息的实际风险有多大？对此，我们对两种攻击场景（即位置分类和用户特征分析）进行了系统分析。在 Foursquare 数据集和跟踪数据上进行的实验表明，高质量的空间信息有可能被滥用，即使位置误差不超过 200 米，也会导致严重的隐私损失。公共背景数据（如 POI）的可用性在基于空间信息的推理中起着关键作用。我们的研究结果指出了不断增长的跟踪数据和空间背景数据数据库所带来的风险，政策制定者应考虑将其纳入隐私法规，并指导个人采取个人位置保护措施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Big Data Computer Science-Information Systems

CiteScore

17.80

自引率

3.70%

发文量

105

审稿时长

13 weeks

期刊介绍： The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.