Learning state labels for sparse classification of speech with matrix deconvolution

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707724

Antti Hurmalainen, T. Virtanen

引用次数: 6

Abstract

Non-negative spectral factorisation with long temporal context has been successfully used for noise robust recognition of speech in multi-source environments. Sparse classification from activations of speech atoms can be employed instead of conventional GMMs to determine speech state likelihoods. For accurate classification, correct linguistic state labels must be assigned to speech atoms. We propose using non-negative matrix deconvolution for learning the labels with algorithms closely matching a framework that separates speech from additive noises. Experiments on the 1st CHiME Challenge corpus show improvement in recognition accuracy over labels acquired from original atom sources or previously used least squares regression. The new approach also circumvents numerical issues encountered in previous learning methods, and opens up possibilities for new speech basis generation algorithms.

查看原文本刊更多论文

基于矩阵反卷积的语音稀疏分类状态标签学习

长时间背景下的非负谱分解已成功用于多源环境下的语音噪声鲁棒识别。语音原子激活的稀疏分类可以代替传统的gmm来确定语音状态的可能性。为了准确分类，必须给语音原子分配正确的语言状态标签。我们建议使用非负矩阵反卷积来学习标签，算法与将语音与加性噪声分离的框架密切匹配。在第一个CHiME Challenge语料库上的实验表明，与从原始原子源或先前使用的最小二乘回归获得的标签相比，识别精度有所提高。新方法也避免了以前的学习方法中遇到的数值问题，并为新的语音基生成算法开辟了可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量