Enhancing loudspeaker-based 3D audio with room modeling

2010 IEEE International Workshop on Multimedia Signal Processing Pub Date : 2010-12-10 DOI:10.1109/MMSP.2010.5661990

Myung-Suk Song, Cha Zhang, D. Florêncio, Hong-Goo Kang

{"title":"Enhancing loudspeaker-based 3D audio with room modeling","authors":"Myung-Suk Song, Cha Zhang, D. Florêncio, Hong-Goo Kang","doi":"10.1109/MMSP.2010.5661990","DOIUrl":null,"url":null,"abstract":"For many years, spatial (3D) sound using headphones has been widely used in a number of applications. A rich spatial sensation is obtained by using head related transfer functions (HRTF) and playing the appropriate sound through headphones. In theory, loudspeaker audio systems would be capable of rendering 3D sound fields almost as rich as headphones, as long as the room impulse responses (RIRs) between the loudspeakers and the ears are known. In practice, however, obtaining these RIRs is hard, and the performance of loudspeaker based systems is far from perfect. New hope has been recently raised by a system that tracks the user's head position and orientation, and incorporates them into the RIRs estimates in real time. That system made two simplifying assumptions: it used generic HRTFs, and it ignored room reverberation. In this paper we tackle the second problem: we incorporate a room reverberation estimate into the RIRs. Note that this is a nontrivial task: RIRs vary significantly with the listener's positions, and even if one could measure them at a few points, they are notoriously hard to interpolate. Instead, we take an indirect approach: we model the room, and from that model we obtain an estimate of the main reflections. Position and characteristics of walls do not vary with the users' movement, yet they allow to quickly compute an estimate of the RIR for each new user position. Of course the key question is whether the estimates are good enough. We show an improvement in localization perception of up to 32% (i.e., reducing average error from 23.5° to 15.9°).","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2010.5661990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

For many years, spatial (3D) sound using headphones has been widely used in a number of applications. A rich spatial sensation is obtained by using head related transfer functions (HRTF) and playing the appropriate sound through headphones. In theory, loudspeaker audio systems would be capable of rendering 3D sound fields almost as rich as headphones, as long as the room impulse responses (RIRs) between the loudspeakers and the ears are known. In practice, however, obtaining these RIRs is hard, and the performance of loudspeaker based systems is far from perfect. New hope has been recently raised by a system that tracks the user's head position and orientation, and incorporates them into the RIRs estimates in real time. That system made two simplifying assumptions: it used generic HRTFs, and it ignored room reverberation. In this paper we tackle the second problem: we incorporate a room reverberation estimate into the RIRs. Note that this is a nontrivial task: RIRs vary significantly with the listener's positions, and even if one could measure them at a few points, they are notoriously hard to interpolate. Instead, we take an indirect approach: we model the room, and from that model we obtain an estimate of the main reflections. Position and characteristics of walls do not vary with the users' movement, yet they allow to quickly compute an estimate of the RIR for each new user position. Of course the key question is whether the estimates are good enough. We show an improvement in localization perception of up to 32% (i.e., reducing average error from 23.5° to 15.9°).

查看原文本刊更多论文

增强基于扬声器的3D音频与房间建模

多年来，使用耳机的空间(3D)声音已被广泛应用于许多应用中。通过使用头部相关传递函数(HRTF)并通过耳机播放适当的声音，获得丰富的空间感。理论上，只要扬声器和耳朵之间的房间脉冲响应(RIRs)是已知的，扬声器音频系统将能够呈现几乎和耳机一样丰富的3D声场。然而，在实践中，获得这些rir是困难的，并且基于扬声器的系统的性能远非完美。最近，一种追踪用户头部位置和方向的系统带来了新的希望，并将它们实时整合到RIRs估计中。该系统做了两个简化的假设:它使用一般的hrtf，并且忽略了房间混响。在本文中，我们解决了第二个问题:我们将一个房间混响估计纳入rir。请注意，这是一项非常重要的任务:rir会随着听者的位置而显著变化，即使可以在几个点上测量它们，它们也很难插入。相反，我们采取间接的方法:我们对房间进行建模，并从该模型中获得对主要反射的估计。墙壁的位置和特征不随用户的移动而变化，但它们允许快速计算每个新用户位置的RIR估计。当然，关键问题是这些估计是否足够准确。我们展示了高达32%的定位感知改进(即，将平均误差从23.5°减少到15.9°)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE International Workshop on Multimedia Signal Processing

自引率

0.00%

发文量