Occam’s Adaptation: A Comparison of Interpolation of Bases Adaptation Methods for Multi-Dialect Acoustic Modeling with LSTMS

M. Grace, M. Bastani, Eugene Weinstein
{"title":"Occam’s Adaptation: A Comparison of Interpolation of Bases Adaptation Methods for Multi-Dialect Acoustic Modeling with LSTMS","authors":"M. Grace, M. Bastani, Eugene Weinstein","doi":"10.1109/SLT.2018.8639654","DOIUrl":null,"url":null,"abstract":"Multidialectal languages can pose challenges for acoustic modeling. Past research has shown that with a large training corpus but without explicit modeling of inter-dialect variability, training individual per-dialect models yields superior performance to that of a single model trained on the combined data [1, 2]. In this work, we were motivated by the idea that adaptation techniques can allow the models to learn dialect-independent features and in turn leverage the power of the larger training corpus sizes afforded when pooling data across dialects. Our goal was thus to create a single multidialect acoustic model that would rival the performance of the dialect-specific models.Working in the context of deep Long-Short Term Memory (LSTM) acoustic models trained on up to 40K hours of speech, we explored several methods for training and incorporating dialect-specific information into the model, including 12 variants of interpolation-of-bases techniques related to Cluster Adaptive Training (CAT) [3] and Factorized Hidden Layer (FHL) [4] techniques. We found that with our model topology and large training corpus, simply appending the dialect-specific information to the feature vector resulted in a more accurate model than any of the more complex interpolation-of-bases techniques, while requiring less model complexity and fewer parameters. This simple adaptation yielded a single unified model for all dialects that, in most cases, outperformed individual models which had been trained per-dialect.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Multidialectal languages can pose challenges for acoustic modeling. Past research has shown that with a large training corpus but without explicit modeling of inter-dialect variability, training individual per-dialect models yields superior performance to that of a single model trained on the combined data [1, 2]. In this work, we were motivated by the idea that adaptation techniques can allow the models to learn dialect-independent features and in turn leverage the power of the larger training corpus sizes afforded when pooling data across dialects. Our goal was thus to create a single multidialect acoustic model that would rival the performance of the dialect-specific models.Working in the context of deep Long-Short Term Memory (LSTM) acoustic models trained on up to 40K hours of speech, we explored several methods for training and incorporating dialect-specific information into the model, including 12 variants of interpolation-of-bases techniques related to Cluster Adaptive Training (CAT) [3] and Factorized Hidden Layer (FHL) [4] techniques. We found that with our model topology and large training corpus, simply appending the dialect-specific information to the feature vector resulted in a more accurate model than any of the more complex interpolation-of-bases techniques, while requiring less model complexity and fewer parameters. This simple adaptation yielded a single unified model for all dialects that, in most cases, outperformed individual models which had been trained per-dialect.
Occam的自适应:基于LSTMS的多方言声学建模的基数自适应插值方法比较
多方言语言会给声学建模带来挑战。过去的研究表明,在庞大的训练语料库中,如果没有明确的方言间变异性建模,训练单个的方言模型比在组合数据上训练单个模型的性能要好[1,2]。在这项工作中,我们的动机是适应技术可以让模型学习与方言无关的特征,进而利用跨方言汇集数据时提供的更大的训练语料库的力量。因此,我们的目标是创建一个单一的多方言声学模型,可以与特定方言模型的性能相媲美。的上下文中工作深多空词记忆(LSTM)声学模型训练40 k小时的演讲中,我们探索了几种方法培训和将dialect-specific信息纳入模型,包括12个变异interpolation-of-bases技术与集群相关的自适应训练隐层(猫)[3]和映像(飞毛腿)[4]技术。我们发现,使用我们的模型拓扑和大型训练语料库,简单地将特定于方言的信息附加到特征向量上,可以获得比任何更复杂的基础插值技术更准确的模型,同时需要更少的模型复杂性和更少的参数。这种简单的调整产生了一个适用于所有方言的单一统一模型,在大多数情况下,它比针对每种方言训练的单个模型表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信