深度分类与检索的跨批参考学习

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI:10.1145/2964284.2964324

Huei-Fang Yang, Kevin Lin, Chu-Song Chen

{"title":"深度分类与检索的跨批参考学习","authors":"Huei-Fang Yang, Kevin Lin, Chu-Song Chen","doi":"10.1145/2964284.2964324","DOIUrl":null,"url":null,"abstract":"Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called cross-batch reference (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the mean average precision (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Cross-batch Reference Learning for Deep Classification and Retrieval\",\"authors\":\"Huei-Fang Yang, Kevin Lin, Chu-Song Chen\",\"doi\":\"10.1145/2964284.2964324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called cross-batch reference (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the mean average precision (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.\",\"PeriodicalId\":140670,\"journal\":{\"name\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2964284.2964324\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2964324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

学习图像检索的特征表示对于多媒体搜索和挖掘应用是必不可少的。近年来，深度卷积网络(cnn)因其在目标检测和图像分类方面令人印象深刻的性能而受到广泛关注，并且从大规模通用数据集(例如ImageNet)中学习到的特征表示可以转移到其他领域的数据集上或对其进行微调。然而，当使用深度CNN学习到的特征表示应用于图像检索时，其性能仍然不如用于分类，这限制了其在相关图像搜索中的适用性。为了保证学习到的特征空间的检索能力，我们引入了一种新的思想——交叉批引用(cross-batch reference, CBR)来增强cnn的随机梯度下降(SGD)训练。在训练过程的每次迭代中，网络调整不仅依赖于单个批次的训练样本，还依赖于其他批次样本传递的信息。这种批次间通信机制被制定为基于平均精度(MAP)标准的跨批次检索过程，其中鼓励将相关和不相关的样本分别放在检索列表的顶部和后部。学习到的特征空间不仅对不同的类别具有区别性，而且对彼此相关或同一类别的样本也强制集中。为了最大化跨批MAP，我们设计了一个损失函数，它是网络特征层上MAP的近似下界，它是可微的，更容易优化。通过结合批内分类和批间交叉参考损失，学习到的特征对分类和检索任务都是有效的。各种基准测试的实验结果证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cross-batch Reference Learning for Deep Classification and Retrieval

Learning feature representations for image retrieval is essential to multimedia search and mining applications. Recently, deep convolutional networks (CNNs) have gained much attention due to their impressive performance on object detection and image classification, and the feature representations learned from a large-scale generic dataset (e.g., ImageNet) can be transferred to or fine-tuned on the datasets of other domains. However, when the feature representations learned with a deep CNN are applied to image retrieval, the performance is still not as good as they are used for classification, which restricts their applicability to relevant image search. To ensure the retrieval capability of the learned feature space, we introduce a new idea called cross-batch reference (CBR) to enhance the stochastic-gradient-descent (SGD) training of CNNs. In each iteration of our training process, the network adjustment relies not only on the training samples in a single batch, but also on the information passed by the samples in the other batches. This inter-batches communication mechanism is formulated as a cross-batch retrieval process based on the mean average precision (MAP) criterion, where the relevant and irrelevant samples are encouraged to be placed on top and rear of the retrieval list, respectively. The learned feature space is not only discriminative to different classes, but the samples that are relevant to each other or of the same class are also enforced to be centralized. To maximize the cross-batch MAP, we design a loss function that is an approximated lower bound of the MAP on the feature layer of the network, which is differentiable and easier for optimization. By combining the intra-batch classification and inter-batch cross-reference losses, the learned features are effective for both classification and retrieval tasks. Experimental results on various benchmarks demonstrate the effectiveness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 24th ACM international conference on Multimedia

自引率

0.00%

发文量