An empirical study of fusion operators for multimodal image retrieval

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI) Pub Date : 2012-06-27 DOI:10.1109/CBMI.2012.6269843

G. Csurka, S. Clinchant

引用次数: 14

Abstract

In this paper we propose an empirical study of late fusion operators for multimodal image retrieval. Therefore, we consider two experts, one based on textual and one on visual similarities between documents and study the possibilities to go beyond simple score averaging. The main idea is to exploit the correlation between the two experts by encoding explicitly or implicitly an "and" and an "or" operator in an efficient way. We show through several experiments that the operators that combine both of these two aspects generally outperform the ones that look only to one of them. Based on this observation we propose several generalized version of most classical fusion operators and compare them using ImageClef benchmark datasets both in an unsupervised and in a supervised framework.

查看原文本刊更多论文

多模态图像检索融合算子的实证研究

本文提出了一种用于多模态图像检索的后期融合算子的实证研究。因此，我们考虑两个专家，一个基于文本，另一个基于文档之间的视觉相似性，并研究超越简单平均得分的可能性。其主要思想是以一种有效的方式显式或隐式地编码“和”和“或”操作符，从而利用两个专家之间的相关性。我们通过几个实验表明，将这两个方面结合起来的运营商通常比只关注其中一个方面的运营商表现更好。基于这一观察，我们提出了几种经典融合算子的广义版本，并使用ImageClef基准数据集在无监督和有监督框架下对它们进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

自引率

0.00%

发文量