3DVQA: Visual Question Answering for 3D Environments

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI:10.1109/CRV55824.2022.00038

Yasaman Etesam, Leon Kochiev, Angel X. Chang

引用次数: 3

Abstract

Visual Question Answering (VQA) is a widely studied problem in computer vision and natural language processing. However, current approaches to VQA have been investigated primarily in the 2D image domain. We study VQA in the 3D domain, with our input being point clouds of real-world 3D scenes, instead of 2D images. We believe that this 3D data modality provide richer spatial relation information that is of interest in the VQA task. In this paper, we introduce the 3DVQA-ScanNet dataset, the first VQA dataset in 3D, and we investigate the performance of a spectrum of baseline approaches on the 3D VQA task.

查看原文本刊更多论文

3DVQA: 3D环境的视觉问答

视觉问答(Visual Question answer, VQA)是计算机视觉和自然语言处理领域中一个被广泛研究的问题。然而，目前的VQA方法主要是在二维图像领域进行研究的。我们在3D领域研究VQA，我们的输入是真实3D场景的点云，而不是2D图像。我们相信这种3D数据模式提供了对VQA任务感兴趣的更丰富的空间关系信息。在本文中，我们介绍了3DVQA-ScanNet数据集，这是第一个3DVQA数据集，我们研究了一系列基线方法在3DVQA任务上的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 19th Conference on Robots and Vision (CRV)

自引率

0.00%

发文量