变分贝叶斯多通道鲁棒NMF用于可变形和部分遮挡麦克风阵列的人声增强

2016 24th European Signal Processing Conference (EUSIPCO) Pub Date : 2016-11-28 DOI:10.1109/EUSIPCO.2016.7760402

Yoshiaki Bando, Katsutoshi Itoyama, M. Konyo, S. Tadokoro, K. Nakadai, Kazuyoshi Yoshii, HIroshi G. Okuno

{"title":"变分贝叶斯多通道鲁棒NMF用于可变形和部分遮挡麦克风阵列的人声增强","authors":"Yoshiaki Bando, Katsutoshi Itoyama, M. Konyo, S. Tadokoro, K. Nakadai, Kazuyoshi Yoshii, HIroshi G. Okuno","doi":"10.1109/EUSIPCO.2016.7760402","DOIUrl":null,"url":null,"abstract":"This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.","PeriodicalId":127068,"journal":{"name":"2016 24th European Signal Processing Conference (EUSIPCO)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array\",\"authors\":\"Yoshiaki Bando, Katsutoshi Itoyama, M. Konyo, S. Tadokoro, K. Nakadai, Kazuyoshi Yoshii, HIroshi G. Okuno\",\"doi\":\"10.1109/EUSIPCO.2016.7760402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.\",\"PeriodicalId\":127068,\"journal\":{\"name\":\"2016 24th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUSIPCO.2016.7760402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUSIPCO.2016.7760402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

提出了一种针对可变形部分遮挡麦克风阵列的人声增强方法。尽管分布在软管形状的救援机器人长长的身体上的麦克风阵列对于在倒塌的建筑物下寻找受害者至关重要，但麦克风阵列捕获的人声受到非静止执行器和摩擦噪声的污染。标准的盲源分离方法不能使用，因为麦克风的相对位置会随着时间的推移而变化，其中一些麦克风偶尔会被碎石遮蔽。为了解决这些问题，我们开发了一个贝叶斯模型，该模型不使用依赖于阵列布局的相位信息，将多通道振幅谱图分离为稀疏和低秩分量(人的声音和噪声)。每个麦克风的声级以时变方式估计，以减少阴影麦克风的影响。通过一个3米长的软管型机器人和8个麦克风的实验表明，我们的方法比传统方法的信噪比高出2.7 dB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array

This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 24th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量