A. Chistyakova, M. Cherepnina, K. Arkhipenko, Sergey D. Kuznetsov, Chang-Seok Oh, Sebeom Park
{"title":"Evaluation of interpretability methods for adversarial robustness on real-world datasets","authors":"A. Chistyakova, M. Cherepnina, K. Arkhipenko, Sergey D. Kuznetsov, Chang-Seok Oh, Sebeom Park","doi":"10.1109/ivmem53963.2021.00007","DOIUrl":null,"url":null,"abstract":"Adversarial training is considered the most powerful approach for robustness against attacks on deep neural networks involving adversarial examples. However, recent works have shown that the similar robustness level can be achieved by other means, namely interpretability-based regularization. We evaluate these interpretability-based approaches on real-world ResNet models trained on CIFAR-10 and ImageNet datasets. Our results show that interpretability can marginally improve robustness when combined with adversarial training, however, they bring additional computational complexity making these approaches questionable for such models and datasets.","PeriodicalId":360766,"journal":{"name":"2021 Ivannikov Memorial Workshop (IVMEM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Ivannikov Memorial Workshop (IVMEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ivmem53963.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Adversarial training is considered the most powerful approach for robustness against attacks on deep neural networks involving adversarial examples. However, recent works have shown that the similar robustness level can be achieved by other means, namely interpretability-based regularization. We evaluate these interpretability-based approaches on real-world ResNet models trained on CIFAR-10 and ImageNet datasets. Our results show that interpretability can marginally improve robustness when combined with adversarial training, however, they bring additional computational complexity making these approaches questionable for such models and datasets.