RGB-D Scene Labeling with Multimodal Recurrent Neural Networks

Heng Fan, Xue Mei, D. Prokhorov, Haibin Ling
{"title":"RGB-D Scene Labeling with Multimodal Recurrent Neural Networks","authors":"Heng Fan, Xue Mei, D. Prokhorov, Haibin Ling","doi":"10.1109/CVPRW.2017.31","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNNs) are able to capture context in an image by modeling long-range semantic dependencies among image units. However, existing methods only utilize RNNs to model dependencies of a single modality (e.g., RGB) for labeling. In this work we extend this single-modal RNNs to multimodal RNNs (MM-RNNs) and apply it to RGB-D scene labeling. Our MM-RNNs are capable of seamlessly modeling dependencies of both RGB and depth modalities, and allow 'memory' sharing across modalities. By sharing 'memory', each modality possesses multiple properties of itself and other modalities, and becomes more discriminative to distinguish pixels. Moreover, we also analyse two simple extensions of single-modal RNNs and demonstrate that our MM-RNNs perform better than both of them. Integrating with convolutional neural networks (CNNs), we build an end-to-end network for RGB-D scene labeling. Extensive experiments on NYU depth V1 and V2 demonstrate the effectiveness of MM-RNNs.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"41 1","pages":"203-211"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2017.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Recurrent neural networks (RNNs) are able to capture context in an image by modeling long-range semantic dependencies among image units. However, existing methods only utilize RNNs to model dependencies of a single modality (e.g., RGB) for labeling. In this work we extend this single-modal RNNs to multimodal RNNs (MM-RNNs) and apply it to RGB-D scene labeling. Our MM-RNNs are capable of seamlessly modeling dependencies of both RGB and depth modalities, and allow 'memory' sharing across modalities. By sharing 'memory', each modality possesses multiple properties of itself and other modalities, and becomes more discriminative to distinguish pixels. Moreover, we also analyse two simple extensions of single-modal RNNs and demonstrate that our MM-RNNs perform better than both of them. Integrating with convolutional neural networks (CNNs), we build an end-to-end network for RGB-D scene labeling. Extensive experiments on NYU depth V1 and V2 demonstrate the effectiveness of MM-RNNs.
基于多模态递归神经网络的RGB-D场景标记
递归神经网络(RNNs)能够通过建模图像单元之间的远程语义依赖关系来捕获图像中的上下文。然而,现有方法仅利用rnn对单一模态(例如RGB)的依赖关系进行建模以进行标记。在这项工作中,我们将这种单模态rnn扩展到多模态rnn (mm - rnn),并将其应用于RGB-D场景标记。我们的mm - rnn能够无缝地建模RGB和深度模式的依赖关系,并允许跨模式的“内存”共享。通过共享“记忆”,每个模态都具有自身和其他模态的多重属性,并且在区分像素时更具辨别能力。此外,我们还分析了单模态rnn的两个简单扩展,并证明了我们的mm - rnn比它们都表现得更好。结合卷积神经网络(cnn),构建了RGB-D场景标记的端到端网络。在NYU深度V1和V2上的大量实验证明了mm - rnn的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信