Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-06-27 DOI:10.1109/CVPR.2016.324

Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu

{"title":"Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition","authors":"Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu","doi":"10.1109/CVPR.2016.324","DOIUrl":null,"url":null,"abstract":"RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"2969-2976"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"97","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2016.324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 97

Abstract

RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.

查看原文本刊更多论文

基于多模态特征融合的RGBD室内场景识别

由于深度传感器的快速发展和广泛的应用场景，RGBD场景识别越来越受到人们的关注。虽然已经进行了许多研究，但大多数工作使用手工制作的特征，这很难捕获高级语义结构。最近，从深度卷积神经网络中提取的特征在各种计算机视觉任务中产生了最先进的结果，这激发了研究人员探索将CNN学习特征用于RGBD场景理解的探索。另一方面，大多数现有工作将rgb和深度特征结合在一起，没有充分利用它们之间的一致性和互补性信息。受近年来一些基于多模态特征融合的RGBD目标识别工作的启发，我们首次引入了一种新的用于RGBD场景识别的判别性多模态融合框架，该框架同时考虑了所有样本的模态间和模态内相关性，同时对学习到的特征进行正则化，使其具有判别性和紧凑性。多模态层的结果可以反向传播到较低的CNN层，因此CNN层和多模态层的参数迭代更新直到收敛。在最近提出的大规模SUN RGB-D数据集上的实验表明，我们的方法在没有任何图像分割的情况下达到了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量