SUN RGB-D: A RGB-D scene understanding benchmark suite

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI:10.1109/CVPR.2015.7298655

Shuran Song, Samuel P. Lichtenberg, Jianxiong Xiao

引用次数: 1433

Abstract

Although RGB-D sensors have enabled major break-throughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding. Perhaps one of the main reasons is the lack of a large-scale benchmark with 3D annotations and 3D evaluation metrics. In this paper, we introduce an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,335 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 64,595 3D bounding boxes with accurate object orientations, as well as a 3D room layout and scene category for each image. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

查看原文本刊更多论文

SUN RGB-D:一个RGB-D场景理解基准套件

尽管RGB-D传感器已经在一些视觉任务(如3D重建)上取得了重大突破，但我们在高级场景理解方面还没有取得同样的成功。可能主要原因之一是缺乏3D注释和3D评估指标的大规模基准测试。在本文中，我们介绍了一个RGB-D基准套件，目的是在所有主要场景理解任务中推进最先进的技术。我们的数据集由四个不同的传感器捕获，包含10,335张RGB-D图像，其规模与PASCAL VOC相似。整个数据集被密集注释，包括146,617个2D多边形和64,595个具有精确对象方向的3D边界框，以及每个图像的3D房间布局和场景类别。该数据集使我们能够训练场景理解任务的数据饥渴算法，使用有意义的3D指标评估它们，避免过度拟合到小测试集，并研究跨传感器偏差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量