使用深度循环架构的低资源语音识别的瞬时频率滤波器组特征

Shekhar Nayak, C. S. Kumar, K. Murty
{"title":"使用深度循环架构的低资源语音识别的瞬时频率滤波器组特征","authors":"Shekhar Nayak, C. S. Kumar, K. Murty","doi":"10.1109/NCC52529.2021.9530049","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNNs) and its variants have achieved significant success in speech recognition. Long short term memory (LSTM) and gated recurrent units (GRUs) are the two most popular variants which overcome the vanishing gradient problem of RNNs and also learn effectively long term dependencies. Light gated recurrent units (Li-GRUs) are more compact versions of standard GRUs. Li-GRUs have been shown to provide better recognition accuracy with significantly faster training. These different RNN inspired architectures invariably use magnitude based features and the phase information is generally ignored. We propose to incorporate the features derived from the analytic phase of the speech signals for speech recognition using these RNN variants. Instantaneous frequency filter-bank (IFFB) features derived from Fourier transform relations performed at par with the standard MFCC features for recurrent units based acoustic models despite being derived from phase information only. Different system combinations of IFFB features with the magnitude based features provided lowest PER of 12.9% and showed relative improvements of up to 16.8% over standalone MFCC features on TIMIT phone recognition using Li-GRU based architecture. IFFB features significantly outperformed the modified group delay coefficients (MGDC) features in all our experiments.","PeriodicalId":414087,"journal":{"name":"2021 National Conference on Communications (NCC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"INSTANTANEOUS FREQUENCY FILTER-BANK FEATURES FOR LOW RESOURCE SPEECH RECOGNITION USING DEEP RECURRENT ARCHITECTURES\",\"authors\":\"Shekhar Nayak, C. S. Kumar, K. Murty\",\"doi\":\"10.1109/NCC52529.2021.9530049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recurrent neural networks (RNNs) and its variants have achieved significant success in speech recognition. Long short term memory (LSTM) and gated recurrent units (GRUs) are the two most popular variants which overcome the vanishing gradient problem of RNNs and also learn effectively long term dependencies. Light gated recurrent units (Li-GRUs) are more compact versions of standard GRUs. Li-GRUs have been shown to provide better recognition accuracy with significantly faster training. These different RNN inspired architectures invariably use magnitude based features and the phase information is generally ignored. We propose to incorporate the features derived from the analytic phase of the speech signals for speech recognition using these RNN variants. Instantaneous frequency filter-bank (IFFB) features derived from Fourier transform relations performed at par with the standard MFCC features for recurrent units based acoustic models despite being derived from phase information only. Different system combinations of IFFB features with the magnitude based features provided lowest PER of 12.9% and showed relative improvements of up to 16.8% over standalone MFCC features on TIMIT phone recognition using Li-GRU based architecture. IFFB features significantly outperformed the modified group delay coefficients (MGDC) features in all our experiments.\",\"PeriodicalId\":414087,\"journal\":{\"name\":\"2021 National Conference on Communications (NCC)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC52529.2021.9530049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC52529.2021.9530049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

递归神经网络(RNNs)及其变体在语音识别方面取得了显著的成功。长短期记忆(LSTM)和门控循环单元(gru)是两种最流行的变体,它们克服了rnn的梯度消失问题,并有效地学习了长期依赖关系。光门控循环单元(li - gru)是标准gru的更紧凑的版本。Li-GRUs已被证明可以在更快的训练速度下提供更好的识别准确性。这些不同的RNN启发架构总是使用基于幅度的特征,而相位信息通常被忽略。我们建议结合语音信号的分析阶段的特征,使用这些RNN变体进行语音识别。瞬时频率滤波器组(IFFB)特征来源于傅里叶变换关系,与基于循环单元的声学模型的标准MFCC特征相同,尽管仅来源于相位信息。在基于Li-GRU架构的TIMIT手机识别中,IFFB特征与基于幅度的特征的不同系统组合提供了最低的PER(12.9%),并且比独立的MFCC特征的相对改进高达16.8%。在我们所有的实验中,IFFB特征明显优于改进的群延迟系数(MGDC)特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
INSTANTANEOUS FREQUENCY FILTER-BANK FEATURES FOR LOW RESOURCE SPEECH RECOGNITION USING DEEP RECURRENT ARCHITECTURES
Recurrent neural networks (RNNs) and its variants have achieved significant success in speech recognition. Long short term memory (LSTM) and gated recurrent units (GRUs) are the two most popular variants which overcome the vanishing gradient problem of RNNs and also learn effectively long term dependencies. Light gated recurrent units (Li-GRUs) are more compact versions of standard GRUs. Li-GRUs have been shown to provide better recognition accuracy with significantly faster training. These different RNN inspired architectures invariably use magnitude based features and the phase information is generally ignored. We propose to incorporate the features derived from the analytic phase of the speech signals for speech recognition using these RNN variants. Instantaneous frequency filter-bank (IFFB) features derived from Fourier transform relations performed at par with the standard MFCC features for recurrent units based acoustic models despite being derived from phase information only. Different system combinations of IFFB features with the magnitude based features provided lowest PER of 12.9% and showed relative improvements of up to 16.8% over standalone MFCC features on TIMIT phone recognition using Li-GRU based architecture. IFFB features significantly outperformed the modified group delay coefficients (MGDC) features in all our experiments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信