Reconstructing room scales with a single sound for augmented reality displays

IF 3.4 3区工程技术 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY

Journal of Information Display Pub Date : 2022-11-15 DOI:10.1080/15980316.2022.2145377

Benjamin Liang, AN Liang, Irán R. Román, Tomer Weiss, Budmonde Duinkharjav, J. Bello, Qi Sun

{"title":"Reconstructing room scales with a single sound for augmented reality displays","authors":"Benjamin Liang, AN Liang, Irán R. Román, Tomer Weiss, Budmonde Duinkharjav, J. Bello, Qi Sun","doi":"10.1080/15980316.2022.2145377","DOIUrl":null,"url":null,"abstract":"Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.","PeriodicalId":16257,"journal":{"name":"Journal of Information Display","volume":"24 1","pages":"1 - 12"},"PeriodicalIF":3.4000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Display","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/15980316.2022.2145377","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

Abstract

Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.

查看原文本刊更多论文

用单个声音重建增强现实显示的房间尺度

感知和重建我们的3D物理环境是增强现实（AR）显示器广泛应用的一项重要任务。例如，重建的几何体通常用于在精确位置显示3D对象。虽然相机捕获的图像是真实重建3D物理环境的常用数据源，但它们仅限于视线环境，需要耗时且重复的数据捕获技术来捕获完整的3D图片。例如，当前的AR设备要求用户扫描整个房间以获得其几何尺寸。当空间被遮挡或无法进入时，这种光学过程是乏味和不适用的。与光不同，声波通过从不同表面反弹在空间中传播，但不会被墙等单个物体“遮挡”。在这项研究中，我们的目的是问“一个人能听到房间的大小吗？”。为了回答这个问题，我们提出了一种仅从单个声音推断房间几何形状的方法，我们将其定义为从单个扬声器播放的声波序列，利用深度学习来解码来自单个扬声器和麦克风系统的隐含空间信息。通过一系列的实验和研究，我们的工作证明了我们的方法在推断三维环境的空间布局方面的有效性。我们的工作在多模态布局重建中引入了一个稳健的构建块。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊