基于期望最大化特征向量聚类的多语音源到达方向估计

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2016-05-19 DOI:10.1109/ICASSP.2016.7472895

Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Chng Eng Siong, Haizhou Li

{"title":"基于期望最大化特征向量聚类的多语音源到达方向估计","authors":"Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Chng Eng Siong, Haizhou Li","doi":"10.1109/ICASSP.2016.7472895","DOIUrl":null,"url":null,"abstract":"This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources\",\"authors\":\"Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Chng Eng Siong, Haizhou Li\",\"doi\":\"10.1109/ICASSP.2016.7472895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.\",\"PeriodicalId\":165321,\"journal\":{\"name\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2016.7472895\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

提出了一种基于麦克风阵列的多语音信号到达方向估计的特征向量聚类方法。现有的聚类方法通常只使用低频来避免空间混叠。在这项研究中，我们提出了一种概率特征向量聚类方法来使用所有频率。在我们的工作中，首先使用噪声本底跟踪、起始检测和相干性测试的组合来检测仅由一个源主导的时频(TF)箱。对于每个选定的TF bin，提取其空间协方差矩阵的最大特征向量进行聚类。引入混合密度模型来模拟特征向量的分布，其中每个分量分布对应于一个源，并由源的DOA参数化。为了使用所有频率的特征向量，在分布函数中使用源的所有频率的转向向量。源的doa可以通过使用期望最大化(EM)算法最大化特征向量的似然来估计。仿真和实验结果表明，与基于单源主导的TF bin的MUSIC算法和之前的聚类方法相比，该方法显著提高了多语音源DOA估计的均方根误差(RMSE)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources

This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量