Reverberation and Noise Robust Feature Compensation Based on IMM

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI:10.1109/TASL.2013.2256893

C. Han, S. Kang, N. Kim

引用次数: 11

Abstract

In this paper, we propose a novel feature compensation approach based on the interacting multiple model (IMM) algorithm specially designed for joint processing of background noise and acoustic reverberation. Our approach to cope with the time-varying environmental parameters is to establish a switching linear dynamic model for the additive and convolutive distortions, such as the background noise and acoustic reverberation, in the log-spectral domain. We construct multiple state space models with the speech corruption process in which the log spectra of clean speech and log frequency response of acoustic reverberation are jointly handled as the state of our interest. The proposed approach shows significant improvements in the Aurora-5 automatic speech recognition (ASR) task which was developed to investigate the influence on the performance of ASR for a hands-free speech input in noisy room environments.

查看原文本刊更多论文

基于IMM的混响和噪声鲁棒特征补偿

本文提出了一种基于交互多模型(IMM)算法的特征补偿方法，该算法是专门针对背景噪声和混响的联合处理而设计的。我们处理时变环境参数的方法是在对数谱域中建立加性和卷积性失真(如背景噪声和混响)的切换线性动态模型。我们利用语音腐败过程构建了多个状态空间模型，其中干净语音的对数频谱和声混响的对数频率响应共同处理为我们感兴趣的状态。提出的方法在Aurora-5自动语音识别(ASR)任务中显示出显著的改进，该任务是为了研究嘈杂房间环境中免提语音输入对ASR性能的影响而开发的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.