多任务组合与师生培训

J. H. M. Wong, M. Gales
{"title":"多任务组合与师生培训","authors":"J. H. M. Wong, M. Gales","doi":"10.1109/ASRU.2017.8268920","DOIUrl":null,"url":null,"abstract":"Ensemble methods often yield significant gains for automatic speech recognition. One method to obtain a diverse ensemble is to separately train models with a range of context dependent targets, often implemented as state clusters. However, decoding the complete ensemble can be computationally expensive. To reduce this cost, the ensemble can be generated using a multi-task architecture. Here, the hidden layers are merged across all members of the ensemble, leaving only separate output layers for each set of targets. Previous investigations of this form of ensemble have used cross-entropy training, which is shown in this paper to produce only limited diversity between members of the ensemble. This paper extends the multi-task framework in several ways. First, the multi-task ensemble can be trained in a teacher-student fashion toward the ensemble of separate models, with the aim of increasing diversity. Second, the multi-task ensemble can be trained with a sequence discriminative criterion. Finally, a student model, with a single output layer, can be trained to emulate the combined ensemble, to further reduce the computational cost of decoding. These methods are evaluated on the Babel conversational telephone speech, AMI meeting transcription, and HUB4 English broadcast news tasks.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Multi-task ensembles with teacher-student training\",\"authors\":\"J. H. M. Wong, M. Gales\",\"doi\":\"10.1109/ASRU.2017.8268920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ensemble methods often yield significant gains for automatic speech recognition. One method to obtain a diverse ensemble is to separately train models with a range of context dependent targets, often implemented as state clusters. However, decoding the complete ensemble can be computationally expensive. To reduce this cost, the ensemble can be generated using a multi-task architecture. Here, the hidden layers are merged across all members of the ensemble, leaving only separate output layers for each set of targets. Previous investigations of this form of ensemble have used cross-entropy training, which is shown in this paper to produce only limited diversity between members of the ensemble. This paper extends the multi-task framework in several ways. First, the multi-task ensemble can be trained in a teacher-student fashion toward the ensemble of separate models, with the aim of increasing diversity. Second, the multi-task ensemble can be trained with a sequence discriminative criterion. Finally, a student model, with a single output layer, can be trained to emulate the combined ensemble, to further reduce the computational cost of decoding. These methods are evaluated on the Babel conversational telephone speech, AMI meeting transcription, and HUB4 English broadcast news tasks.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

集成方法通常在自动语音识别中产生显著的增益。获得多样化集成的一种方法是分别训练具有一系列上下文相关目标的模型,这些目标通常作为状态聚类实现。然而,解码完整的集合在计算上可能是昂贵的。为了降低成本,可以使用多任务体系结构生成集成。在这里,隐藏层跨集成的所有成员合并,为每组目标只留下单独的输出层。以前对这种形式的集合的研究使用了交叉熵训练,本文表明,交叉熵训练只能产生有限的集合成员之间的多样性。本文从几个方面扩展了多任务框架。首先,多任务集成可以以师生的方式训练成独立模型的集成,目的是增加多样性。其次,多任务集成可以用序列判别准则进行训练。最后,可以训练一个具有单个输出层的学生模型来模拟组合的集成,以进一步降低解码的计算成本。这些方法在Babel会话电话语音,AMI会议转录和HUB4英语广播新闻任务中进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-task ensembles with teacher-student training
Ensemble methods often yield significant gains for automatic speech recognition. One method to obtain a diverse ensemble is to separately train models with a range of context dependent targets, often implemented as state clusters. However, decoding the complete ensemble can be computationally expensive. To reduce this cost, the ensemble can be generated using a multi-task architecture. Here, the hidden layers are merged across all members of the ensemble, leaving only separate output layers for each set of targets. Previous investigations of this form of ensemble have used cross-entropy training, which is shown in this paper to produce only limited diversity between members of the ensemble. This paper extends the multi-task framework in several ways. First, the multi-task ensemble can be trained in a teacher-student fashion toward the ensemble of separate models, with the aim of increasing diversity. Second, the multi-task ensemble can be trained with a sequence discriminative criterion. Finally, a student model, with a single output layer, can be trained to emulate the combined ensemble, to further reduce the computational cost of decoding. These methods are evaluated on the Babel conversational telephone speech, AMI meeting transcription, and HUB4 English broadcast news tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信