{"title":"Multiple Negative Samples Based on GAN for Cross-Modal Retrieval","authors":"Xiaoqian Ma, Feifei Wang, Yahui Hou","doi":"10.1109/ICECE54449.2021.9674509","DOIUrl":null,"url":null,"abstract":"Cross-modal retrieval has attracted wide attention for retrieving multimedia data such as images, text, audio, and video. However, due to the great differences in the underlying representation and distribution of modalities, it’s still challenging to find the semantic similarity of various modalities. Benefited from the introduction of the generative adversarial network (GAN) into cross-modal retrieval, the performance of models has been significantly improved. To generate discriminative representations, adversarial learning utilizes triplet constraint to establish connections among the anchor, the positive sample, and the negative sample. The strategy of triplet loss can separate the anchor and a certain class of the negative sample, but fails to guarantee the anchor and other categories of items could also be scattered in the subspace. This paper proposes a novel multiple negative samples model based on GAN (MNS-GAN) to increase intra-modal discrimination. Comprehensive experiments show that our proposed MNS-GAN method outperforms the state-of-the-art cross-modal retrieval methods.","PeriodicalId":166178,"journal":{"name":"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECE54449.2021.9674509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-modal retrieval has attracted wide attention for retrieving multimedia data such as images, text, audio, and video. However, due to the great differences in the underlying representation and distribution of modalities, it’s still challenging to find the semantic similarity of various modalities. Benefited from the introduction of the generative adversarial network (GAN) into cross-modal retrieval, the performance of models has been significantly improved. To generate discriminative representations, adversarial learning utilizes triplet constraint to establish connections among the anchor, the positive sample, and the negative sample. The strategy of triplet loss can separate the anchor and a certain class of the negative sample, but fails to guarantee the anchor and other categories of items could also be scattered in the subspace. This paper proposes a novel multiple negative samples model based on GAN (MNS-GAN) to increase intra-modal discrimination. Comprehensive experiments show that our proposed MNS-GAN method outperforms the state-of-the-art cross-modal retrieval methods.