kpn -匿名：对k -匿名的扩展，用于web应用的用户匿名性评估

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2025-09-01 DOI:10.1016/j.array.2025.100499

Ángel Merino , Ángel Cuevas , Rubén Cuevas

{"title":"kpn -匿名：对k -匿名的扩展，用于web应用的用户匿名性评估","authors":"Ángel Merino , Ángel Cuevas , Rubén Cuevas","doi":"10.1016/j.array.2025.100499","DOIUrl":null,"url":null,"abstract":"<div><div>User data powers much of the Internet nowadays. Beyond personally identifiable information (PII), online systems routinely collect several non-PII user attributes. In world-scale datasets, users may share their combination of attribute values with only a few others, or even be the sole individual matching a specific combination. This makes evaluating and comparing general anonymity across systems challenging, as classical K-anonymity would often be one. We introduce a general methodology to assess user anonymity in general datasets, addressing this issue. Our approach adapts the concept of K-anonymity to focus on most users rather than the least anonymous ones, proposing the metric <span><math><msubsup><mrow><mi>K</mi></mrow><mrow><mi>P</mi></mrow><mrow><mi>N</mi></mrow></msubsup></math></span>: the minimum anonymity (K) among the most anonymous P% of users defined by N attributes. This metric enables the comparison of anonymity levels across systems, helping to identify risks and evaluate the impact of changes such as attribute granularity redesign.</div><div>We demonstrate the applicability of this metric through a case study involving three digital platforms: Meta, LinkedIn, and Twitter (X), leveraging audience data from their advertising systems. We define three common attributes and one platform-specific attribute to reflect each platform’s unique segmentation capabilities. By examining all possible combinations and applying our metric, we demonstrate that Twitter provides the highest levels of anonymity, while Meta yields the lowest. We also study how a specific change in the age attribute cardinality can increase anonymity by more than 10 times on Meta. This case illustrates the utility of our metric in assessing and comparing anonymity risks in real-world data systems.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100499"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"KPN-anonymity: Extension of K-anonymity for user anonymity evaluation on web applications\",\"authors\":\"Ángel Merino , Ángel Cuevas , Rubén Cuevas\",\"doi\":\"10.1016/j.array.2025.100499\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>User data powers much of the Internet nowadays. Beyond personally identifiable information (PII), online systems routinely collect several non-PII user attributes. In world-scale datasets, users may share their combination of attribute values with only a few others, or even be the sole individual matching a specific combination. This makes evaluating and comparing general anonymity across systems challenging, as classical K-anonymity would often be one. We introduce a general methodology to assess user anonymity in general datasets, addressing this issue. Our approach adapts the concept of K-anonymity to focus on most users rather than the least anonymous ones, proposing the metric <span><math><msubsup><mrow><mi>K</mi></mrow><mrow><mi>P</mi></mrow><mrow><mi>N</mi></mrow></msubsup></math></span>: the minimum anonymity (K) among the most anonymous P% of users defined by N attributes. This metric enables the comparison of anonymity levels across systems, helping to identify risks and evaluate the impact of changes such as attribute granularity redesign.</div><div>We demonstrate the applicability of this metric through a case study involving three digital platforms: Meta, LinkedIn, and Twitter (X), leveraging audience data from their advertising systems. We define three common attributes and one platform-specific attribute to reflect each platform’s unique segmentation capabilities. By examining all possible combinations and applying our metric, we demonstrate that Twitter provides the highest levels of anonymity, while Meta yields the lowest. We also study how a specific change in the age attribute cardinality can increase anonymity by more than 10 times on Meta. This case illustrates the utility of our metric in assessing and comparing anonymity risks in real-world data systems.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100499\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625001262\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

如今，用户数据驱动着互联网的大部分内容。除了个人可识别信息（PII）之外，在线系统通常还会收集几个非PII用户属性。在世界范围的数据集中，用户可能只与少数其他人共享他们的属性值组合，或者甚至是唯一匹配特定组合的个人。这使得评估和比较系统中的一般匿名性具有挑战性，因为经典的k -匿名性通常是其中之一。我们介绍了一种通用方法来评估一般数据集中的用户匿名性，以解决这个问题。我们的方法采用K-匿名的概念来关注大多数用户而不是最不匿名的用户，提出了度量KPN：由N个属性定义的最匿名的P%用户中的最小匿名(K)。此度量可以比较系统间的匿名级别，帮助识别风险并评估属性粒度重新设计等更改的影响。我们通过一个涉及三个数字平台的案例研究来证明这一指标的适用性：Meta、LinkedIn和Twitter (X)，利用来自其广告系统的受众数据。我们定义了三个通用属性和一个特定于平台的属性，以反映每个平台独特的分段功能。通过检查所有可能的组合并应用我们的指标，我们证明Twitter提供了最高级别的匿名性，而Meta提供了最低级别的匿名性。我们还研究了年龄属性基数的特定变化如何使Meta上的匿名性提高10倍以上。这个案例说明了我们的度量在评估和比较真实数据系统中的匿名风险方面的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

KPN-anonymity: Extension of K-anonymity for user anonymity evaluation on web applications

查看原文本刊更多论文

KPN-anonymity: Extension of K-anonymity for user anonymity evaluation on web applications

User data powers much of the Internet nowadays. Beyond personally identifiable information (PII), online systems routinely collect several non-PII user attributes. In world-scale datasets, users may share their combination of attribute values with only a few others, or even be the sole individual matching a specific combination. This makes evaluating and comparing general anonymity across systems challenging, as classical K-anonymity would often be one. We introduce a general methodology to assess user anonymity in general datasets, addressing this issue. Our approach adapts the concept of K-anonymity to focus on most users rather than the least anonymous ones, proposing the metric

K_{P}^{N}

: the minimum anonymity (K) among the most anonymous P% of users defined by N attributes. This metric enables the comparison of anonymity levels across systems, helping to identify risks and evaluate the impact of changes such as attribute granularity redesign.

We demonstrate the applicability of this metric through a case study involving three digital platforms: Meta, LinkedIn, and Twitter (X), leveraging audience data from their advertising systems. We define three common attributes and one platform-specific attribute to reflect each platform’s unique segmentation capabilities. By examining all possible combinations and applying our metric, we demonstrate that Twitter provides the highest levels of anonymity, while Meta yields the lowest. We also study how a specific change in the age attribute cardinality can increase anonymity by more than 10 times on Meta. This case illustrates the utility of our metric in assessing and comparing anonymity risks in real-world data systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Array Computer Science-General Computer Science

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

45 days