{"title":"Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array","authors":"Yue Qiao, Vinay Kothapally, Meng Yu, Dong Yu","doi":"arxiv-2409.06954","DOIUrl":null,"url":null,"abstract":"Spatial audio formats like Ambisonics are playback device layout-agnostic and\nwell-suited for applications such as teleconferencing and virtual reality.\nConventional Ambisonic encoding methods often rely on spherical microphone\narrays for efficient sound field capture, which limits their flexibility in\npractical scenarios. We propose a deep learning (DL)-based approach, leveraging\na two-stage network architecture for encoding circular microphone array signals\ninto second-order Ambisonics (SOA) in multi-speaker environments. In addition,\nwe introduce: (i) a novel loss function based on spatial power maps to\nregularize inter-channel correlations of the Ambisonic signals, and (ii) a\nchannel permutation technique to resolve the ambiguity of encoding vertical\ninformation using a horizontal circular array. Evaluation on simulated speech\nand noise datasets shows that our approach consistently outperforms traditional\nsignal processing (SP) and DL-based methods, providing significantly better\ntimbral and spatial quality and higher source localization accuracy. Binaural\naudio demos with visualizations are available at\nhttps://bridgoon97.github.io/NeuralAmbisonicEncoding/.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial audio formats like Ambisonics are playback device layout-agnostic and
well-suited for applications such as teleconferencing and virtual reality.
Conventional Ambisonic encoding methods often rely on spherical microphone
arrays for efficient sound field capture, which limits their flexibility in
practical scenarios. We propose a deep learning (DL)-based approach, leveraging
a two-stage network architecture for encoding circular microphone array signals
into second-order Ambisonics (SOA) in multi-speaker environments. In addition,
we introduce: (i) a novel loss function based on spatial power maps to
regularize inter-channel correlations of the Ambisonic signals, and (ii) a
channel permutation technique to resolve the ambiguity of encoding vertical
information using a horizontal circular array. Evaluation on simulated speech
and noise datasets shows that our approach consistently outperforms traditional
signal processing (SP) and DL-based methods, providing significantly better
timbral and spatial quality and higher source localization accuracy. Binaural
audio demos with visualizations are available at
https://bridgoon97.github.io/NeuralAmbisonicEncoding/.