On Temporal Context Information for Hybrid BLSTM-Based Phoneme Recognition

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003946

Timo Lohrenz, Maximilian Strake, T. Fingscheidt

引用次数: 3

Abstract

The modern approach to include long-term temporal context information into speech recognition systems is the use of recurrent neural networks, e.g., bi-directional long short-term memory (BLSTM) networks. In this paper, we decouple the BLSTM from a preceding CNN-based feature extractor network allowing us to investigate the use of temporal context in both models in a modular fashion. Accordingly, we train the BLSTMs on posteriors, stemming from preceding CNNs which use various amounts of limited context in their input layer, and investigate to what extent the BLSTM is able to effectively make use of its long-term modeling capabilities. We show that it is beneficial to train the BLSTM on posteriors stemming from a temporal context-free acoustic model. Remarkably, the best performing combination of CNN acoustic model and BLSTM afterwards is a large-context CNN (expected), followed by a BLSTM which has been trained on context-free CNN output posteriors (surprising).

查看原文本刊更多论文

基于混合blstm的音位识别的时间上下文信息研究

将长时间上下文信息纳入语音识别系统的现代方法是使用递归神经网络，例如双向长短期记忆(BLSTM)网络。在本文中，我们将BLSTM与之前基于cnn的特征提取器网络解耦，使我们能够以模块化的方式研究两个模型中时间上下文的使用。因此，我们在后验上训练BLSTM，源自之前在其输入层中使用不同数量的有限上下文的cnn，并研究BLSTM能够在多大程度上有效利用其长期建模能力。我们证明了在时域无上下文声学模型的后验上训练BLSTM是有益的。值得注意的是，CNN声学模型和BLSTM之后表现最好的组合是一个大上下文CNN(意料之中)，其次是一个在无上下文CNN输出后验上训练的BLSTM(出人意料)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量