Speech recognition features based on deep latent Gaussian models

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2017-09-01 DOI:10.1109/MLSP.2017.8168174

Andros Tjandra, S. Sakti, Satoshi Nakamura

引用次数: 0

Abstract

This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.

查看原文本刊更多论文

基于深隐高斯模型的语音识别特征

本文利用深隐高斯模型(DLGM)构建基于生成模型的语音特征，该模型使用随机梯度变分贝叶斯(SGVB)算法进行训练，并使用有向概率图模型进行有效的近似推理和学习。训练后的DLGM生成基于高斯分布的潜在变量，作为深度神经网络声学模型的新特征。在这里，我们比较了经过DLGM转换的特征和没有经过DLGM转换的特征的结果，并观察了将提出的特征和原始特征结合到单个DNN中的好处。我们的实验结果表明，使用DLGM提出的特征提高了ASR性能。此外，将所提特征与原始特征相结合的深度神经网络声学模型表现最佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量