Wishart Localization Prior On Spatial Covariance Matrix In Ambisonic Source Separation Using Non-Negative Tensor Factorization

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI:10.1109/ICASSP43922.2022.9746222

Mateusz Guzik, K. Kowalczyk

{"title":"Wishart Localization Prior On Spatial Covariance Matrix In Ambisonic Source Separation Using Non-Negative Tensor Factorization","authors":"Mateusz Guzik, K. Kowalczyk","doi":"10.1109/ICASSP43922.2022.9746222","DOIUrl":null,"url":null,"abstract":"This paper presents an extension of the existing Non-negative Tensor Factorization (NTF) based method for sound source separation under reverberant conditions, formulated for Ambisonic microphone mixture signals. In particular, we address the problem of optimal exploitation of the prior knowledge concerning the source localization, through the formulation of a suitable Maximum a Posteriori (MAP) framework. Within the presented approach, the magnitude spectrograms are modelled by the NTF and the individual source Spatial Covariance Matrices (SCM) are approximated as a sum of anechoic Spherical Harmonic (SH) components, weighted with the so-called spatial selector. We constrain the SCM using the Wishart distribution, which leads to a new posterior probability and in turn to the derivation of the extended update rules. The proposed solution avoids the issues encountered in the original method, related to the empirical binary initialization strategy for the spatial selector weights, which due to multiplicative update rules may result in sound coming from certain directions not being taken into account. The proposed method is evaluated against the original algorithm and another recently proposed Expectation Maximization (EM) algorithm that also incorporates a spatial localization prior, showing improved separation performance in experiments with first-order Ambisonic recordings of musical instruments and speech utterances.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP43922.2022.9746222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This paper presents an extension of the existing Non-negative Tensor Factorization (NTF) based method for sound source separation under reverberant conditions, formulated for Ambisonic microphone mixture signals. In particular, we address the problem of optimal exploitation of the prior knowledge concerning the source localization, through the formulation of a suitable Maximum a Posteriori (MAP) framework. Within the presented approach, the magnitude spectrograms are modelled by the NTF and the individual source Spatial Covariance Matrices (SCM) are approximated as a sum of anechoic Spherical Harmonic (SH) components, weighted with the so-called spatial selector. We constrain the SCM using the Wishart distribution, which leads to a new posterior probability and in turn to the derivation of the extended update rules. The proposed solution avoids the issues encountered in the original method, related to the empirical binary initialization strategy for the spatial selector weights, which due to multiplicative update rules may result in sound coming from certain directions not being taken into account. The proposed method is evaluated against the original algorithm and another recently proposed Expectation Maximization (EM) algorithm that also incorporates a spatial localization prior, showing improved separation performance in experiments with first-order Ambisonic recordings of musical instruments and speech utterances.

查看原文本刊更多论文

非负张量分解双声源分离中空间协方差矩阵的Wishart定位先验

本文提出了现有的基于非负张量分解(NTF)的混响条件下声源分离方法的扩展，该方法是针对双声传声器混合信号制定的。特别是，我们通过制定合适的最大后验(MAP)框架，解决了有关源定位的先验知识的最优利用问题。在该方法中，幅度谱图由NTF建模，单个源空间协方差矩阵(SCM)近似为消声球谐(SH)分量的和，并使用所谓的空间选择器进行加权。我们使用Wishart分布约束SCM，从而得到一个新的后验概率，进而推导出扩展的更新规则。提出的解决方案避免了原始方法中遇到的问题，即空间选择器权重的经验二进制初始化策略，由于乘法更新规则可能导致来自某些方向的声音不被考虑在内。将该方法与原始算法和另一种最近提出的期望最大化(EM)算法进行了比较，该算法也包含了空间定位先验，在一阶乐器和语音录音的实验中显示出更好的分离性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量