{"title":"多模态图像检索融合算子的实证研究","authors":"G. Csurka, S. Clinchant","doi":"10.1109/CBMI.2012.6269843","DOIUrl":null,"url":null,"abstract":"In this paper we propose an empirical study of late fusion operators for multimodal image retrieval. Therefore, we consider two experts, one based on textual and one on visual similarities between documents and study the possibilities to go beyond simple score averaging. The main idea is to exploit the correlation between the two experts by encoding explicitly or implicitly an \"and\" and an \"or\" operator in an efficient way. We show through several experiments that the operators that combine both of these two aspects generally outperform the ones that look only to one of them. Based on this observation we propose several generalized version of most classical fusion operators and compare them using ImageClef benchmark datasets both in an unsupervised and in a supervised framework.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"An empirical study of fusion operators for multimodal image retrieval\",\"authors\":\"G. Csurka, S. Clinchant\",\"doi\":\"10.1109/CBMI.2012.6269843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose an empirical study of late fusion operators for multimodal image retrieval. Therefore, we consider two experts, one based on textual and one on visual similarities between documents and study the possibilities to go beyond simple score averaging. The main idea is to exploit the correlation between the two experts by encoding explicitly or implicitly an \\\"and\\\" and an \\\"or\\\" operator in an efficient way. We show through several experiments that the operators that combine both of these two aspects generally outperform the ones that look only to one of them. Based on this observation we propose several generalized version of most classical fusion operators and compare them using ImageClef benchmark datasets both in an unsupervised and in a supervised framework.\",\"PeriodicalId\":120769,\"journal\":{\"name\":\"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBMI.2012.6269843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMI.2012.6269843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An empirical study of fusion operators for multimodal image retrieval
In this paper we propose an empirical study of late fusion operators for multimodal image retrieval. Therefore, we consider two experts, one based on textual and one on visual similarities between documents and study the possibilities to go beyond simple score averaging. The main idea is to exploit the correlation between the two experts by encoding explicitly or implicitly an "and" and an "or" operator in an efficient way. We show through several experiments that the operators that combine both of these two aspects generally outperform the ones that look only to one of them. Based on this observation we propose several generalized version of most classical fusion operators and compare them using ImageClef benchmark datasets both in an unsupervised and in a supervised framework.