Scene labeling with LSTM recurrent neural networks

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI:10.1109/CVPR.2015.7298977

Wonmin Byeon, T. Breuel, Federico Raue, M. Liwicki

{"title":"Scene labeling with LSTM recurrent neural networks","authors":"Wonmin Byeon, T. Breuel, Federico Raue, M. Liwicki","doi":"10.1109/CVPR.2015.7298977","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification. We investigate two-dimensional (2D) LSTM networks for natural scene images taking into account the complex spatial dependencies of labels. Prior methods generally have required separate classification and image segmentation stages and/or pre- and post-processing. In our approach, classification, segmentation, and context integration are all carried out by 2D LSTM networks, allowing texture and spatial model parameters to be learned within a single model. The networks efficiently capture local and global contextual information over raw RGB values and adapt well for complex scene images. Our approach, which has a much lower computational complexity than prior methods, achieved state-of-the-art performance over the Stanford Background and the SIFT Flow datasets. In fact, if no pre- or post-processing is applied, LSTM networks outperform other state-of-the-art approaches. Hence, only with a single-core Central Processing Unit (CPU), the running time of our approach is equivalent or better than the compared state-of-the-art approaches which use a Graphics Processing Unit (GPU). Finally, our networks' ability to visualize feature maps from each layer supports the hypothesis that LSTM networks are overall suited for image processing tasks.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"345","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2015.7298977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 345

Abstract

This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification. We investigate two-dimensional (2D) LSTM networks for natural scene images taking into account the complex spatial dependencies of labels. Prior methods generally have required separate classification and image segmentation stages and/or pre- and post-processing. In our approach, classification, segmentation, and context integration are all carried out by 2D LSTM networks, allowing texture and spatial model parameters to be learned within a single model. The networks efficiently capture local and global contextual information over raw RGB values and adapt well for complex scene images. Our approach, which has a much lower computational complexity than prior methods, achieved state-of-the-art performance over the Stanford Background and the SIFT Flow datasets. In fact, if no pre- or post-processing is applied, LSTM networks outperform other state-of-the-art approaches. Hence, only with a single-core Central Processing Unit (CPU), the running time of our approach is equivalent or better than the compared state-of-the-art approaches which use a Graphics Processing Unit (GPU). Finally, our networks' ability to visualize feature maps from each layer supports the hypothesis that LSTM networks are overall suited for image processing tasks.

查看原文本刊更多论文

基于LSTM递归神经网络的场景标注

本文采用一种完全基于学习的方法，利用长短期记忆(LSTM)递归神经网络解决了场景图像的像素级分割和分类问题。考虑到标签的复杂空间依赖性，我们研究了自然场景图像的二维(2D) LSTM网络。先前的方法通常需要单独的分类和图像分割阶段和/或预处理和后处理。在我们的方法中，分类、分割和上下文集成都是由2D LSTM网络进行的，允许在单个模型中学习纹理和空间模型参数。该网络有效地捕获了原始RGB值上的局部和全局上下文信息，并能很好地适应复杂的场景图像。我们的方法比以前的方法具有更低的计算复杂度，在斯坦福背景和SIFT流数据集上实现了最先进的性能。事实上，如果不进行预处理或后处理，LSTM网络的性能会优于其他最先进的方法。因此，仅使用单核中央处理器(CPU)，我们的方法的运行时间相当于或优于使用图形处理单元(GPU)的比较先进的方法。最后，我们的网络从每一层可视化特征映射的能力支持了LSTM网络总体上适合图像处理任务的假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量