On Temporal Context Information for Hybrid BLSTM-Based Phoneme Recognition

Timo Lohrenz, Maximilian Strake, T. Fingscheidt
{"title":"On Temporal Context Information for Hybrid BLSTM-Based Phoneme Recognition","authors":"Timo Lohrenz, Maximilian Strake, T. Fingscheidt","doi":"10.1109/ASRU46091.2019.9003946","DOIUrl":null,"url":null,"abstract":"The modern approach to include long-term temporal context information into speech recognition systems is the use of recurrent neural networks, e.g., bi-directional long short-term memory (BLSTM) networks. In this paper, we decouple the BLSTM from a preceding CNN-based feature extractor network allowing us to investigate the use of temporal context in both models in a modular fashion. Accordingly, we train the BLSTMs on posteriors, stemming from preceding CNNs which use various amounts of limited context in their input layer, and investigate to what extent the BLSTM is able to effectively make use of its long-term modeling capabilities. We show that it is beneficial to train the BLSTM on posteriors stemming from a temporal context-free acoustic model. Remarkably, the best performing combination of CNN acoustic model and BLSTM afterwards is a large-context CNN (expected), followed by a BLSTM which has been trained on context-free CNN output posteriors (surprising).","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"382 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The modern approach to include long-term temporal context information into speech recognition systems is the use of recurrent neural networks, e.g., bi-directional long short-term memory (BLSTM) networks. In this paper, we decouple the BLSTM from a preceding CNN-based feature extractor network allowing us to investigate the use of temporal context in both models in a modular fashion. Accordingly, we train the BLSTMs on posteriors, stemming from preceding CNNs which use various amounts of limited context in their input layer, and investigate to what extent the BLSTM is able to effectively make use of its long-term modeling capabilities. We show that it is beneficial to train the BLSTM on posteriors stemming from a temporal context-free acoustic model. Remarkably, the best performing combination of CNN acoustic model and BLSTM afterwards is a large-context CNN (expected), followed by a BLSTM which has been trained on context-free CNN output posteriors (surprising).
基于混合blstm的音位识别的时间上下文信息研究
将长时间上下文信息纳入语音识别系统的现代方法是使用递归神经网络,例如双向长短期记忆(BLSTM)网络。在本文中,我们将BLSTM与之前基于cnn的特征提取器网络解耦,使我们能够以模块化的方式研究两个模型中时间上下文的使用。因此,我们在后验上训练BLSTM,源自之前在其输入层中使用不同数量的有限上下文的cnn,并研究BLSTM能够在多大程度上有效利用其长期建模能力。我们证明了在时域无上下文声学模型的后验上训练BLSTM是有益的。值得注意的是,CNN声学模型和BLSTM之后表现最好的组合是一个大上下文CNN(意料之中),其次是一个在无上下文CNN输出后验上训练的BLSTM(出人意料)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信