Reconstructing room scales with a single sound for augmented reality displays

IF 3.7 3区 工程技术 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Benjamin Liang, AN Liang, Irán R. Román, Tomer Weiss, Budmonde Duinkharjav, J. Bello, Qi Sun
{"title":"Reconstructing room scales with a single sound for augmented reality displays","authors":"Benjamin Liang, AN Liang, Irán R. Román, Tomer Weiss, Budmonde Duinkharjav, J. Bello, Qi Sun","doi":"10.1080/15980316.2022.2145377","DOIUrl":null,"url":null,"abstract":"Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.","PeriodicalId":16257,"journal":{"name":"Journal of Information Display","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Display","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/15980316.2022.2145377","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 1

Abstract

Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.
用单个声音重建增强现实显示的房间尺度
感知和重建我们的3D物理环境是增强现实(AR)显示器广泛应用的一项重要任务。例如,重建的几何体通常用于在精确位置显示3D对象。虽然相机捕获的图像是真实重建3D物理环境的常用数据源,但它们仅限于视线环境,需要耗时且重复的数据捕获技术来捕获完整的3D图片。例如,当前的AR设备要求用户扫描整个房间以获得其几何尺寸。当空间被遮挡或无法进入时,这种光学过程是乏味和不适用的。与光不同,声波通过从不同表面反弹在空间中传播,但不会被墙等单个物体“遮挡”。在这项研究中,我们的目的是问“一个人能听到房间的大小吗?”。为了回答这个问题,我们提出了一种仅从单个声音推断房间几何形状的方法,我们将其定义为从单个扬声器播放的声波序列,利用深度学习来解码来自单个扬声器和麦克风系统的隐含空间信息。通过一系列的实验和研究,我们的工作证明了我们的方法在推断三维环境的空间布局方面的有效性。我们的工作在多模态布局重建中引入了一个稳健的构建块。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Display
Journal of Information Display MATERIALS SCIENCE, MULTIDISCIPLINARY-
CiteScore
7.10
自引率
5.40%
发文量
27
审稿时长
30 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信