非线性读出节点能提高基于库的语音识别器的性能吗?

2011 First International Conference on Informatics and Computational Intelligence Pub Date : 2011-12-12 DOI:10.1109/ICI.2011.50

Fabian Triefenbach, J. Martens

{"title":"非线性读出节点能提高基于库的语音识别器的性能吗?","authors":"Fabian Triefenbach, J. Martens","doi":"10.1109/ICI.2011.50","DOIUrl":null,"url":null,"abstract":"It has been shown for some time that a Recurrent Neural Network (RNN) can perform an accurate acoustic-phonetic decoding of a continuous speech stream. However, the error back-propagation through time (EBPTT) training of such a network is often critical (bad local optimum) and very time consuming. These problems hamper the deployment of sufficiently large networks that would be able to outperform state-of-the-art Hidden Markov Models. To overcome this drawback of RNNs, we recently proposed to employ a large pool of recurrently connected non-linear nodes (a so-called reservoir) with fixed weights, and to map the reservoir outputs to meaningful phonemic classes by means of a layer of linear output nodes (called the readout nodes) whose weights form the solution of a set of linear equations. In this paper, we collect experimental evidence that the performance of a reservoir-based system can be enhanced by working with non-linear readout nodes. Although this calls for an iterative training, it boils down to a non-linear regression which seems to be less critical and time consuming than EBPTT.","PeriodicalId":146712,"journal":{"name":"2011 First International Conference on Informatics and Computational Intelligence","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Can Non-Linear Readout Nodes Enhance the Performance of Reservoir-Based Speech Recognizers?\",\"authors\":\"Fabian Triefenbach, J. Martens\",\"doi\":\"10.1109/ICI.2011.50\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It has been shown for some time that a Recurrent Neural Network (RNN) can perform an accurate acoustic-phonetic decoding of a continuous speech stream. However, the error back-propagation through time (EBPTT) training of such a network is often critical (bad local optimum) and very time consuming. These problems hamper the deployment of sufficiently large networks that would be able to outperform state-of-the-art Hidden Markov Models. To overcome this drawback of RNNs, we recently proposed to employ a large pool of recurrently connected non-linear nodes (a so-called reservoir) with fixed weights, and to map the reservoir outputs to meaningful phonemic classes by means of a layer of linear output nodes (called the readout nodes) whose weights form the solution of a set of linear equations. In this paper, we collect experimental evidence that the performance of a reservoir-based system can be enhanced by working with non-linear readout nodes. Although this calls for an iterative training, it boils down to a non-linear regression which seems to be less critical and time consuming than EBPTT.\",\"PeriodicalId\":146712,\"journal\":{\"name\":\"2011 First International Conference on Informatics and Computational Intelligence\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 First International Conference on Informatics and Computational Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICI.2011.50\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 First International Conference on Informatics and Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICI.2011.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

一段时间以来，已有研究表明递归神经网络(RNN)可以对连续语音流进行精确的语音解码。然而，这种网络的误差随时间反向传播(EBPTT)训练通常是关键的(非局部最优)并且非常耗时。这些问题阻碍了足够大的网络的部署，这些网络将能够超越最先进的隐马尔可夫模型。为了克服rnn的这一缺点，我们最近提出使用一个具有固定权重的大型递归连接的非线性节点池(所谓的库)，并通过一层线性输出节点(称为读出节点)将库输出映射到有意义的音素类，其权重形成一组线性方程的解。在本文中，我们收集的实验证据表明，非线性读出节点可以提高基于水库的系统的性能。虽然这需要迭代训练，但它归结为非线性回归，这似乎比EBPTT不那么关键和耗时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can Non-Linear Readout Nodes Enhance the Performance of Reservoir-Based Speech Recognizers?

It has been shown for some time that a Recurrent Neural Network (RNN) can perform an accurate acoustic-phonetic decoding of a continuous speech stream. However, the error back-propagation through time (EBPTT) training of such a network is often critical (bad local optimum) and very time consuming. These problems hamper the deployment of sufficiently large networks that would be able to outperform state-of-the-art Hidden Markov Models. To overcome this drawback of RNNs, we recently proposed to employ a large pool of recurrently connected non-linear nodes (a so-called reservoir) with fixed weights, and to map the reservoir outputs to meaningful phonemic classes by means of a layer of linear output nodes (called the readout nodes) whose weights form the solution of a set of linear equations. In this paper, we collect experimental evidence that the performance of a reservoir-based system can be enhanced by working with non-linear readout nodes. Although this calls for an iterative training, it boils down to a non-linear regression which seems to be less critical and time consuming than EBPTT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 First International Conference on Informatics and Computational Intelligence

自引率

0.00%

发文量