{"title":"自监督学习特征的对抗鲁棒性","authors":"Nicholas Mehlman;Shri Narayanan","doi":"10.1109/OJSP.2025.3562797","DOIUrl":null,"url":null,"abstract":"As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"468-477"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10971198","citationCount":"0","resultStr":"{\"title\":\"Adversarial Robustness of Self-Supervised Learning Features\",\"authors\":\"Nicholas Mehlman;Shri Narayanan\",\"doi\":\"10.1109/OJSP.2025.3562797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.\",\"PeriodicalId\":73300,\"journal\":{\"name\":\"IEEE open journal of signal processing\",\"volume\":\"6 \",\"pages\":\"468-477\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10971198\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10971198/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10971198/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Adversarial Robustness of Self-Supervised Learning Features
As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.