基于Maxout神经元的深度双向LSTM声学建模

2017 IEEE International Conference on Robotics and Biomimetics (ROBIO) Pub Date : 2017-12-01 DOI:10.1109/ROBIO.2017.8324646

Yuan Luo, Yu Liu, Yi Zhang, Boyu Wang, Zhou Ye

{"title":"基于Maxout神经元的深度双向LSTM声学建模","authors":"Yuan Luo, Yu Liu, Yi Zhang, Boyu Wang, Zhou Ye","doi":"10.1109/ROBIO.2017.8324646","DOIUrl":null,"url":null,"abstract":"Recently long short-term memory (LSTM) recurrent neural networks (RNN) have achieved greater success in acoustic models for the large vocabulary continuous speech recognition system. In this paper, we propose an improved hybrid acoustic model based on deep bidirectional long short-term memory (DBLSTM) RNN. In this new acoustic model, maxout neurons are used in the fully-connected part of DBLSTM to solve the problems of vanishing and exploding gradient. At the same time, the dropout regularization algorithm is used to avoid the over-fitting during the training process of neural network. In addition, in order to adapt the bidirectional dependence of DBLSTM at each time step, a context-sensitive-chunk (CSC) back-propagation through time (BPTT) algorithm is proposed to train DBLSTM neural network. Simulation experiments have been made on Switchboard benchmark task. The results show that the WER of the improved hybrid acoustic model is 14.5%, and the optimal network structures and CSC configurations are given.","PeriodicalId":197159,"journal":{"name":"2017 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"454 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Maxout neurons based deep bidirectional LSTM for acoustic modeling\",\"authors\":\"Yuan Luo, Yu Liu, Yi Zhang, Boyu Wang, Zhou Ye\",\"doi\":\"10.1109/ROBIO.2017.8324646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently long short-term memory (LSTM) recurrent neural networks (RNN) have achieved greater success in acoustic models for the large vocabulary continuous speech recognition system. In this paper, we propose an improved hybrid acoustic model based on deep bidirectional long short-term memory (DBLSTM) RNN. In this new acoustic model, maxout neurons are used in the fully-connected part of DBLSTM to solve the problems of vanishing and exploding gradient. At the same time, the dropout regularization algorithm is used to avoid the over-fitting during the training process of neural network. In addition, in order to adapt the bidirectional dependence of DBLSTM at each time step, a context-sensitive-chunk (CSC) back-propagation through time (BPTT) algorithm is proposed to train DBLSTM neural network. Simulation experiments have been made on Switchboard benchmark task. The results show that the WER of the improved hybrid acoustic model is 14.5%, and the optimal network structures and CSC configurations are given.\",\"PeriodicalId\":197159,\"journal\":{\"name\":\"2017 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"454 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO.2017.8324646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO.2017.8324646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

近年来，长短期记忆(LSTM)递归神经网络(RNN)在大词汇量连续语音识别系统的声学模型方面取得了较大的成功。本文提出了一种基于深度双向长短期记忆(DBLSTM) RNN的改进混合声学模型。在该声学模型中，在DBLSTM的全连接部分使用了maxout神经元来解决梯度消失和爆炸的问题。同时，采用dropout正则化算法，避免了神经网络在训练过程中的过拟合。此外，为了适应DBLSTM在每个时间步的双向依赖性，提出了一种上下文敏感块(CSC)随时间反向传播(BPTT)算法来训练DBLSTM神经网络。在总机基准任务上进行了仿真实验。结果表明，改进后的混合声学模型的噪声比为14.5%，并给出了最优的网络结构和CSC配置。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Maxout neurons based deep bidirectional LSTM for acoustic modeling

Recently long short-term memory (LSTM) recurrent neural networks (RNN) have achieved greater success in acoustic models for the large vocabulary continuous speech recognition system. In this paper, we propose an improved hybrid acoustic model based on deep bidirectional long short-term memory (DBLSTM) RNN. In this new acoustic model, maxout neurons are used in the fully-connected part of DBLSTM to solve the problems of vanishing and exploding gradient. At the same time, the dropout regularization algorithm is used to avoid the over-fitting during the training process of neural network. In addition, in order to adapt the bidirectional dependence of DBLSTM at each time step, a context-sensitive-chunk (CSC) back-propagation through time (BPTT) algorithm is proposed to train DBLSTM neural network. Simulation experiments have been made on Switchboard benchmark task. The results show that the WER of the improved hybrid acoustic model is 14.5%, and the optimal network structures and CSC configurations are given.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Robotics and Biomimetics (ROBIO)

自引率

0.00%

发文量