在深度神经网络框架下对滤波器组和增量学习的改进

Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran
{"title":"在深度神经网络框架下对滤波器组和增量学习的改进","authors":"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran","doi":"10.1109/ICASSP.2014.6854925","DOIUrl":null,"url":null,"abstract":"Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"6839-6843"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Improvements to filterbank and delta learning within a deep neural network framework\",\"authors\":\"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran\",\"doi\":\"10.1109/ICASSP.2014.6854925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.\",\"PeriodicalId\":6545,\"journal\":{\"name\":\"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"54 1\",\"pages\":\"6839-6843\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2014.6854925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2014.6854925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

语音识别任务中使用的许多特征都是手工制作的,并不总是与手边的目标相关,即最小化单词错误率。最近,我们表明,用与深度神经网络的其余部分共同学习的滤波器库层取代感知激励的mel-filter bank是有希望的。在本文中,我们将滤波器学习扩展到一个自适应的、最先进的系统。首先,我们将增量学习合并到过滤器学习框架中。其次,我们结合了各种说话人自适应技术,包括VTLN翘曲和说话人身份特征。在一个50小时的英语广播新闻任务中,我们表明,与使用一组固定的过滤器和delta相比,使用过滤器和delta学习可以在单词错误率(WER)方面实现5%的相对改进。此外,在演讲者适应之后,我们发现与最先进的CNN相比,过滤器和delta学习允许在WER中相对提高3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improvements to filterbank and delta learning within a deep neural network framework
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信