Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation

Takashi Fukuda, Samuel Thomas
{"title":"Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation","authors":"Takashi Fukuda, Samuel Thomas","doi":"10.1109/ASRU46091.2019.9003760","DOIUrl":null,"url":null,"abstract":"Training of mixed bandwidth acoustic models have recently been realized by incorporating special Mel filterbanks. To fit information into every filterbank bin available across both narrowband and wideband data, these filterbanks pad zeros at high frequency ranges of narrowband data. Although these methods succeed in decreasing word error rates (WER) on broadband data, they fail to improve on narrowband signals. In this paper, we propose methods to mitigate these effects with generalized knowledge distillation. In our method, specialized teacher networks are first trained on lossless acoustic features with full scale Mel filterbanks. While training student networks, privileged knowledge from these teacher networks is then used to compensate for missing information at high frequencies introduced by the special Mel filterbanks. We show the benefit of the proposed technique for both narrowband (10% relative WER improvement) and wideband data (7.5% relative WER improvement) on the Aurora 4 task over traditional methods.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"86 14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Training of mixed bandwidth acoustic models have recently been realized by incorporating special Mel filterbanks. To fit information into every filterbank bin available across both narrowband and wideband data, these filterbanks pad zeros at high frequency ranges of narrowband data. Although these methods succeed in decreasing word error rates (WER) on broadband data, they fail to improve on narrowband signals. In this paper, we propose methods to mitigate these effects with generalized knowledge distillation. In our method, specialized teacher networks are first trained on lossless acoustic features with full scale Mel filterbanks. While training student networks, privileged knowledge from these teacher networks is then used to compensate for missing information at high frequencies introduced by the special Mel filterbanks. We show the benefit of the proposed technique for both narrowband (10% relative WER improvement) and wideband data (7.5% relative WER improvement) on the Aurora 4 task over traditional methods.
基于知识蒸馏的混合带宽声学建模
近年来,混合带宽声学模型的训练是通过加入特殊的Mel滤波器组来实现的。为了将信息放入窄带和宽带数据上可用的每个滤波器组bin中,这些滤波器组在窄带数据的高频范围内填充零。虽然这些方法能够成功地降低宽带数据的字错误率,但却无法改善窄带信号。在本文中,我们提出了用广义知识蒸馏来减轻这些影响的方法。在我们的方法中,专门的教师网络首先使用全尺度Mel滤波器组对无损声学特征进行训练。在训练学生网络时,来自这些教师网络的特权知识随后被用来补偿由特殊Mel滤波器组引入的高频缺失信息。与传统方法相比,我们展示了所提出的技术在极光4号任务上的窄带(10%相对WER改进)和宽带数据(7.5%相对WER改进)方面的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信