Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems

Sheng Li, Xugang Lu, R. Takashima, Peng Shen, Tatsuya Kawahara, H. Kawai
{"title":"Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems","authors":"Sheng Li, Xugang Lu, R. Takashima, Peng Shen, Tatsuya Kawahara, H. Kawai","doi":"10.1109/SLT.2018.8639675","DOIUrl":null,"url":null,"abstract":"The very deep neural network has recently been proposed for speech recognition and achieves significant performance. It has excellent potential for integration with end-to-end (E2E) training. Connectionist temporal classification (CTC) has shown great potential in E2E acoustic modeling. In this study, we investigate deep architectures and techniques which are suitable for CTC-based acoustic modeling. We propose a very deep residual time-delay CTC neural network (VResTD-CTC). How to select a suitable deep architecture optimized with the CTC objective function is crucial for obtaining the state of the art performance. Excellent performances can be obtained by selecting deep architecture for non-E2E ASR systems modeling with tied-triphone states. However, these optimized structures do not guarantee to achieve better or comparable performances on E2E (e.g., CTC-based) systems modeling with dynamic acoustic units. For solving this problem and further leveraging the system performance, we introduce the vertical-attention mechanism to reweight the residual blocks at each time step. Speech recognition experiments show our proposed model significantly outperforms the DNN and LSTM-based (both bidirectional and unidirectional) CTC baseline models.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The very deep neural network has recently been proposed for speech recognition and achieves significant performance. It has excellent potential for integration with end-to-end (E2E) training. Connectionist temporal classification (CTC) has shown great potential in E2E acoustic modeling. In this study, we investigate deep architectures and techniques which are suitable for CTC-based acoustic modeling. We propose a very deep residual time-delay CTC neural network (VResTD-CTC). How to select a suitable deep architecture optimized with the CTC objective function is crucial for obtaining the state of the art performance. Excellent performances can be obtained by selecting deep architecture for non-E2E ASR systems modeling with tied-triphone states. However, these optimized structures do not guarantee to achieve better or comparable performances on E2E (e.g., CTC-based) systems modeling with dynamic acoustic units. For solving this problem and further leveraging the system performance, we introduce the vertical-attention mechanism to reweight the residual blocks at each time step. Speech recognition experiments show our proposed model significantly outperforms the DNN and LSTM-based (both bidirectional and unidirectional) CTC baseline models.
基于垂直注意力的深度时滞神经网络的有效训练
深度神经网络最近被提出用于语音识别,并取得了显著的效果。它具有与端到端(E2E)训练集成的极好潜力。连接时间分类(CTC)在端到端声学建模中显示出巨大的潜力。在这项研究中,我们研究了适合基于ctc的声学建模的深层架构和技术。我们提出了一种非常深度残差时滞CTC神经网络(VResTD-CTC)。如何选择合适的基于CTC目标函数优化的深度架构是获得最佳性能的关键。对非端到端ASR系统进行系扎三音状态建模时,通过选择深度体系结构可以获得优异的性能。然而,这些优化的结构并不能保证在具有动态声学单元的E2E(例如基于ctc的)系统建模中获得更好或相当的性能。为了解决这一问题,进一步利用系统性能,我们引入了垂直关注机制,在每个时间步重加权剩余块。语音识别实验表明,我们提出的模型明显优于基于DNN和lstm(双向和单向)的CTC基线模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信