Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval

Proceedings of the 2019 on International Conference on Multimedia Retrieval Pub Date : 2019-06-05 DOI:10.1145/3323873.3325041

Zhikai Hu, Xin Liu, Xingzhi Wang, Yiu-ming Cheung, N. Wang, Yewang Chen

{"title":"Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval","authors":"Zhikai Hu, Xin Liu, Xingzhi Wang, Yiu-ming Cheung, N. Wang, Yewang Chen","doi":"10.1145/3323873.3325041","DOIUrl":null,"url":null,"abstract":"With the dramatic increase of multi-media data on the Internet, cross-modal retrieval has become an important and valuable task in searching systems. The key challenge of this task is how to build the correlation between multi-modal data. Most existing approaches only focus on dealing with paired data. They use pairwise relationship of multi-modal data for exploring the correlation between them. However, in practice, unpaired data are more common on the Internet but few methods pay attention to them. To utilize both paired and unpaired data, we propose a one-stream framework triplet fusion network hashing (TFNH), which mainly consists of two parts. The first part is a triplet network which is used to handle both kinds of data, with the help of zero padding operation. The second part consists of two data classifiers, which are used to bridge the gap between paired and unpaired data. In addition, we embed manifold learning into the framework for preserving both inter and intra modal similarity, exploring the relationship between unpaired and paired data and bridging the gap between them in learning process. Extensive experiments show that the proposed approach outperforms several state-of-the-art methods on two datasets in paired scenario. We further evaluate its ability of handling unpaired scenario and robustness in regard to pairwise constraint. The results show that even we discard 50% data under the setting in [19], the performance of TFNH is still better than that of other unpaired approaches and that only 70% pairwise relationships are preserved, TFNH can still outperform almost all paired approaches.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3323873.3325041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

With the dramatic increase of multi-media data on the Internet, cross-modal retrieval has become an important and valuable task in searching systems. The key challenge of this task is how to build the correlation between multi-modal data. Most existing approaches only focus on dealing with paired data. They use pairwise relationship of multi-modal data for exploring the correlation between them. However, in practice, unpaired data are more common on the Internet but few methods pay attention to them. To utilize both paired and unpaired data, we propose a one-stream framework triplet fusion network hashing (TFNH), which mainly consists of two parts. The first part is a triplet network which is used to handle both kinds of data, with the help of zero padding operation. The second part consists of two data classifiers, which are used to bridge the gap between paired and unpaired data. In addition, we embed manifold learning into the framework for preserving both inter and intra modal similarity, exploring the relationship between unpaired and paired data and bridging the gap between them in learning process. Extensive experiments show that the proposed approach outperforms several state-of-the-art methods on two datasets in paired scenario. We further evaluate its ability of handling unpaired scenario and robustness in regard to pairwise constraint. The results show that even we discard 50% data under the setting in [19], the performance of TFNH is still better than that of other unpaired approaches and that only 70% pairwise relationships are preserved, TFNH can still outperform almost all paired approaches.

查看原文本刊更多论文

非配对跨模态检索的三重融合网络哈希

随着互联网上多媒体数据的急剧增加，跨模式检索已成为搜索系统中一项重要而有价值的任务。该任务的关键挑战是如何在多模态数据之间建立相关性。大多数现有的方法只关注于处理成对数据。他们使用多模态数据的两两关系来探索它们之间的相关性。然而，在实际应用中，互联网上的非配对数据更为普遍，但很少有方法对其进行关注。为了同时利用成对数据和非成对数据，我们提出了一种单流框架三态融合网络哈希算法(TFNH)，该算法主要由两部分组成。第一部分是一个三元组网络，通过零填充操作来处理这两种数据。第二部分由两个数据分类器组成，它们用于弥合成对和非成对数据之间的差距。此外，我们将流形学习嵌入到框架中，以保持模态间和模态内的相似性，探索未配对和配对数据之间的关系，并在学习过程中弥合它们之间的差距。大量的实验表明，该方法在配对场景下的两个数据集上优于几种最先进的方法。我们进一步评估了其处理非配对场景的能力和关于成对约束的鲁棒性。结果表明，即使我们在[19]的设置下丢弃50%的数据，TFNH的性能仍然优于其他未配对的方法，并且仅保留70%的成对关系，TFNH仍然可以优于几乎所有的配对方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2019 on International Conference on Multimedia Retrieval

自引率

0.00%

发文量