{"title":"Efficient Dual Adversarial Cross Modal Retrieval By Advanced Triplet Loss","authors":"Zhichao Han, Huan Zhou, Kezhong Nong, Zhe Li, Guoyong Lin, Chengjia Huang","doi":"10.1109/ICCSI55536.2022.9970558","DOIUrl":null,"url":null,"abstract":"With the development of technology, the modality of multimedia information become diverse, such as pictures, short videos, text, and so on. However, there is a semantic gap between different media, for example, the image and text are independent of each other and do not interact with each other. How establish retrieval links between different modalities has become more and more important. In this paper, we proposed a modal consisting of dual adversarial neural networks, which obtain the high-order semantics of image and text respectively. Then, the triplet loss is used to widen the distance between different categories in the common space, to obtain a better cross-modal retrieval performance. We conduct experiments on three commonly used benchmark datasets (Wikipedia, NUS-WIDE, and Pascal Sentences), and the experimental results show that our method can effectively improve the performance of cross-modal retrieval.","PeriodicalId":421514,"journal":{"name":"2022 International Conference on Cyber-Physical Social Intelligence (ICCSI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Cyber-Physical Social Intelligence (ICCSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSI55536.2022.9970558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the development of technology, the modality of multimedia information become diverse, such as pictures, short videos, text, and so on. However, there is a semantic gap between different media, for example, the image and text are independent of each other and do not interact with each other. How establish retrieval links between different modalities has become more and more important. In this paper, we proposed a modal consisting of dual adversarial neural networks, which obtain the high-order semantics of image and text respectively. Then, the triplet loss is used to widen the distance between different categories in the common space, to obtain a better cross-modal retrieval performance. We conduct experiments on three commonly used benchmark datasets (Wikipedia, NUS-WIDE, and Pascal Sentences), and the experimental results show that our method can effectively improve the performance of cross-modal retrieval.