{"title":"Unveiling Privacy Risks in the Long Tail: Membership Inference in Class Skewness","authors":"Hailong Hu;Jun Pang;Yantao Li;Huafeng Qin","doi":"10.1109/TIFS.2025.3607261","DOIUrl":null,"url":null,"abstract":"Real-world datasets often exhibit long-tailed distributions, raising important questions about how privacy risks evolve when machine learning (ML) models are applied to such data. In this work, we present a comprehensive analysis of membership inference attacks in long-tailed scenarios, revealing significant privacy vulnerabilities in tail data. We begin by examining standard ML models trained on long-tailed datasets and identify three key privacy risk effects: amplification, convergence, and polarization. Building on these insights, we extend our analysis to state-of-the-art long-tailed learning methods, such as foundation model-based approaches, offering new perspectives on how these models respond to membership inference attacks across head to tail classes. Finally, we investigate the privacy risks of ML models trained with differential privacy in long-tailed scenarios. Our findings corroborate that, even when ML models are designed to improve tail class performance to match head classes and are protected by differential privacy, tail class data remain particularly vulnerable to membership inference attacks.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"9507-9522"},"PeriodicalIF":8.0000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11153515/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Real-world datasets often exhibit long-tailed distributions, raising important questions about how privacy risks evolve when machine learning (ML) models are applied to such data. In this work, we present a comprehensive analysis of membership inference attacks in long-tailed scenarios, revealing significant privacy vulnerabilities in tail data. We begin by examining standard ML models trained on long-tailed datasets and identify three key privacy risk effects: amplification, convergence, and polarization. Building on these insights, we extend our analysis to state-of-the-art long-tailed learning methods, such as foundation model-based approaches, offering new perspectives on how these models respond to membership inference attacks across head to tail classes. Finally, we investigate the privacy risks of ML models trained with differential privacy in long-tailed scenarios. Our findings corroborate that, even when ML models are designed to improve tail class performance to match head classes and are protected by differential privacy, tail class data remain particularly vulnerable to membership inference attacks.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features