在深度神经网络中集成高斯混合:带隐变量的Softmax层

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI:10.1109/ICASSP.2015.7178779

Zoltán Tüske, Muhammad Ali Tahir, R. Schlüter, H. Ney

{"title":"在深度神经网络中集成高斯混合:带隐变量的Softmax层","authors":"Zoltán Tüske, Muhammad Ali Tahir, R. Schlüter, H. Ney","doi":"10.1109/ICASSP.2015.7178779","DOIUrl":null,"url":null,"abstract":"In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables\",\"authors\":\"Zoltán Tüske, Muhammad Ali Tahir, R. Schlüter, H. Ney\",\"doi\":\"10.1109/ICASSP.2015.7178779\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.\",\"PeriodicalId\":117666,\"journal\":{\"name\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2015.7178779\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2015.7178779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

摘要

在混合方法中，神经网络输出直接作为隐马尔可夫模型(HMM)状态后验概率估计。与此相反，在串联方法中，神经网络输出作为输入特征来改进基于经典高斯混合模型(GMM)的发射概率估计。本文表明，GMM可以很容易地集成到深度神经网络框架中。利用其与对数-线性混合模型(LMM)的等价性，可以将GMM转换为一个大的softmax层，然后是一个求和池化层。理论和实验结果表明，联合训练和优化选择的GMM和瓶颈串联特征的性能不会比混合模型差。因此，“混合vs串联”的问题简化为优化神经网络的输出层。在广播新闻和对话任务中，使用多达12个具有s型和整流线性单元激活函数的前馈隐藏层进行语音识别实验。LMM层的评估显示了与经典softmax输出相比的识别增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量