Environment-aware ideal binary mask estimation using monaural cues

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics Pub Date : 2013-10-01 DOI:10.1109/WASPAA.2013.6701821

T. May, T. Dau

引用次数: 15

Abstract

We present a monaural approach to speech segregation that estimates the ideal binary mask (IBM) by combining amplitude modulation spectrogram (AMS) features, pitch-based features and speech presence probability (SPP) features derived from noise statistics. To maintain a high mask estimation accuracy in the presence of various background noises, the system employs environment-specific segregation models and automatically selects the appropriate model for a given input signal. Furthermore, instead of classifying each time-frequency (T-F) unit independently, the a posteriori probabilities of speech and noise presence are evaluated by considering adjacent T-F units. The proposed system achieves high classification accuracy.

查看原文本刊更多论文

基于单信号的环境感知理想二值掩码估计

我们提出了一种单耳的语音隔离方法，该方法通过结合调幅谱图(AMS)特征、基于音高的特征和从噪声统计中得出的语音存在概率(SPP)特征来估计理想二值掩码(IBM)。为了在各种背景噪声存在的情况下保持较高的掩模估计精度，该系统采用了特定于环境的分离模型，并为给定的输入信号自动选择合适的模型。此外，与独立分类每个时间-频率(T-F)单元不同，语音和噪声存在的后验概率通过考虑相邻的T-F单元来评估。该系统具有较高的分类准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

自引率

0.00%

发文量