Adversarial Robustness of Self-Supervised Learning Features

IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Nicholas Mehlman;Shri Narayanan
{"title":"Adversarial Robustness of Self-Supervised Learning Features","authors":"Nicholas Mehlman;Shri Narayanan","doi":"10.1109/OJSP.2025.3562797","DOIUrl":null,"url":null,"abstract":"As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"468-477"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10971198","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10971198/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

As deep learning models have proliferated, concerns about their reliability and security have also increased. One significant challenge is understanding adversarial perturbations, which can alter a model's predictions despite being very small in magnitude. Prior work has proposed that this phenomenon results from a fundamental deficit in supervised learning, by which classifiers exploit whatever input features are more predictive, regardless of whether or not these features are robust to adversarial attacks. In this paper, we consider feature robustness in the context of contrastive self-supervised learning methods that have become especially common in recent years. Our findings suggest that the features learned during self-supervision are, in fact, more resistant to adversarial perturbations than those generated from supervised learning. However, we also find that these self-supervised features exhibit poorer inter-class disentanglement, limiting their contribution to overall classifier robustness.
自监督学习特征的对抗鲁棒性
随着深度学习模型的激增,人们对其可靠性和安全性的担忧也在增加。一个重要的挑战是理解对抗性扰动,它可以改变模型的预测,尽管量级很小。先前的研究已经提出,这种现象是由监督学习的基本缺陷造成的,通过这种缺陷,分类器利用任何更具预测性的输入特征,而不管这些特征是否对对抗性攻击具有鲁棒性。在本文中,我们在对比自监督学习方法的背景下考虑特征鲁棒性,这种方法近年来变得特别普遍。我们的研究结果表明,事实上,在自我监督过程中学习到的特征比在监督学习过程中产生的特征更能抵抗对抗性扰动。然而,我们也发现这些自监督特征表现出较差的类间解纠缠,限制了它们对整体分类器鲁棒性的贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.30
自引率
0.00%
发文量
0
审稿时长
22 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信