利用发音知识方式修改LSTM后验以提高语音识别性能

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2018-12-01 DOI:10.1109/ICMLA.2018.00122

Pradeep Rengaswamy, K. S. Rao

{"title":"利用发音知识方式修改LSTM后验以提高语音识别性能","authors":"Pradeep Rengaswamy, K. S. Rao","doi":"10.1109/ICMLA.2018.00122","DOIUrl":null,"url":null,"abstract":"The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"13 19","pages":"769-772"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Modifying LSTM Posteriors with Manner of Articulation Knowledge to Improve Speech Recognition Performance\",\"authors\":\"Pradeep Rengaswamy, K. S. Rao\",\"doi\":\"10.1109/ICMLA.2018.00122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.\",\"PeriodicalId\":6533,\"journal\":{\"name\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"13 19\",\"pages\":\"769-772\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2018.00122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

递归神经网络(RNN)的变体，如长短期记忆(LSTM)，在序列建模(如自动语音识别(ASR)框架)中是成功的。然而，解码的序列被修剪为有错误的替换，插入和删除。我们利用在幅度线性预测(LP)频谱上计算的频谱平坦度测量(SFM)来检测两种广泛的发音方式，即辅音和障碍。在本文中，我们根据发音检测的方式修改LSTM输出层生成的后验。对传统的译码图给出了改进后验，以减少错误的替换和插入。与深度神经网络(DNN)和LSTM的传统解码方法相比，该方法在核心TIMIT测试语料库上的电话错误率(PER)分别降低了近0.7%和0.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modifying LSTM Posteriors with Manner of Articulation Knowledge to Improve Speech Recognition Performance

The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量