{"title":"Robust Cross-Modal Retrieval by Adversarial Training","authors":"Tao Zhang, Shiliang Sun, Jing Zhao","doi":"10.1109/IJCNN55064.2022.9892637","DOIUrl":null,"url":null,"abstract":"Cross-modal retrieval is usually implemented based on cross-modal representation learning, which is used to extract semantic information from cross-modal data. Recent work shows that cross-modal representation learning is vulnerable to adversarial attacks, even using large-scale pre-trained networks. By attacking the representation, it can be simple to attack the downstream tasks, especially for cross-modal retrieval tasks. Adversarial attacks on any modality will easily lead to obvious retrieval errors, which brings the challenge to improve the adversarial robustness of cross-modal retrieval. In this paper, we propose a robust cross-modal retrieval method (RoCMR), which generates adversarial examples for both the query modality and candidate modality and performs adversarial training for cross-modal retrieval. Specifically, we generate adversarial examples for both image and text modalities and train the model with benign and adversarial examples in the framework of contrastive learning. We evaluate the proposed RoCMR on two datasets and show its effectiveness in defending against gradient-based attacks.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Cross-modal retrieval is usually implemented based on cross-modal representation learning, which is used to extract semantic information from cross-modal data. Recent work shows that cross-modal representation learning is vulnerable to adversarial attacks, even using large-scale pre-trained networks. By attacking the representation, it can be simple to attack the downstream tasks, especially for cross-modal retrieval tasks. Adversarial attacks on any modality will easily lead to obvious retrieval errors, which brings the challenge to improve the adversarial robustness of cross-modal retrieval. In this paper, we propose a robust cross-modal retrieval method (RoCMR), which generates adversarial examples for both the query modality and candidate modality and performs adversarial training for cross-modal retrieval. Specifically, we generate adversarial examples for both image and text modalities and train the model with benign and adversarial examples in the framework of contrastive learning. We evaluate the proposed RoCMR on two datasets and show its effectiveness in defending against gradient-based attacks.