Unveiling Privacy Risks in the Long Tail: Membership Inference in Class Skewness

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-09-08 DOI:10.1109/TIFS.2025.3607261

Hailong Hu;Jun Pang;Yantao Li;Huafeng Qin

{"title":"Unveiling Privacy Risks in the Long Tail: Membership Inference in Class Skewness","authors":"Hailong Hu;Jun Pang;Yantao Li;Huafeng Qin","doi":"10.1109/TIFS.2025.3607261","DOIUrl":null,"url":null,"abstract":"Real-world datasets often exhibit long-tailed distributions, raising important questions about how privacy risks evolve when machine learning (ML) models are applied to such data. In this work, we present a comprehensive analysis of membership inference attacks in long-tailed scenarios, revealing significant privacy vulnerabilities in tail data. We begin by examining standard ML models trained on long-tailed datasets and identify three key privacy risk effects: amplification, convergence, and polarization. Building on these insights, we extend our analysis to state-of-the-art long-tailed learning methods, such as foundation model-based approaches, offering new perspectives on how these models respond to membership inference attacks across head to tail classes. Finally, we investigate the privacy risks of ML models trained with differential privacy in long-tailed scenarios. Our findings corroborate that, even when ML models are designed to improve tail class performance to match head classes and are protected by differential privacy, tail class data remain particularly vulnerable to membership inference attacks.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"9507-9522"},"PeriodicalIF":8.0000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11153515/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Real-world datasets often exhibit long-tailed distributions, raising important questions about how privacy risks evolve when machine learning (ML) models are applied to such data. In this work, we present a comprehensive analysis of membership inference attacks in long-tailed scenarios, revealing significant privacy vulnerabilities in tail data. We begin by examining standard ML models trained on long-tailed datasets and identify three key privacy risk effects: amplification, convergence, and polarization. Building on these insights, we extend our analysis to state-of-the-art long-tailed learning methods, such as foundation model-based approaches, offering new perspectives on how these models respond to membership inference attacks across head to tail classes. Finally, we investigate the privacy risks of ML models trained with differential privacy in long-tailed scenarios. Our findings corroborate that, even when ML models are designed to improve tail class performance to match head classes and are protected by differential privacy, tail class data remain particularly vulnerable to membership inference attacks.

查看原文本刊更多论文

揭示长尾中的隐私风险：类偏度中的隶属度推断

现实世界的数据集通常表现为长尾分布，这就提出了一个重要的问题，即当机器学习（ML）模型应用于此类数据时，隐私风险是如何演变的。在这项工作中，我们对长尾场景中的成员推理攻击进行了全面分析，揭示了尾部数据中存在的重大隐私漏洞。我们首先检查在长尾数据集上训练的标准ML模型，并确定三个关键的隐私风险效应：放大、收敛和极化。在这些见解的基础上，我们将分析扩展到最先进的长尾学习方法，例如基于基础模型的方法，为这些模型如何响应从头到尾类的成员推理攻击提供了新的视角。最后，我们研究了在长尾场景中使用差分隐私训练的ML模型的隐私风险。我们的研究结果证实，即使ML模型旨在提高尾类性能以匹配头类并受到差异隐私保护，尾类数据仍然特别容易受到成员推理攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features