2016年多语言BLSTM和特定说话人的载体适应，但巴别塔系统

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846330

M. Karafiát, M. Baskar, P. Matejka, Karel Veselý, F. Grézl, J. Černocký

{"title":"2016年多语言BLSTM和特定说话人的载体适应，但巴别塔系统","authors":"M. Karafiát, M. Baskar, P. Matejka, Karel Veselý, F. Grézl, J. Černocký","doi":"10.1109/SLT.2016.7846330","DOIUrl":null,"url":null,"abstract":"This paper provides an extensive summary of BUT 2016 system for the last IARPA Babel evaluations. It concentrates on multi-lingual training of both deep neural network (DNN)-based feature extraction and acoustic models including multilingual training of bidirectional Long Short Term memory networks. Next, two low-dimensional vector approaches to speaker adaptation are investigated: i-vectors and sequence-summarizing neural networks (SSNN). The results provided on three Babel Year 4 languages show clear advantage of both approaches in case limited amount of training data is available. The time necessary for the development of a new system is addressed too, as some of the investigated techniques do not require extensive re-training of the whole system.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Multilingual BLSTM and speaker-specific vector adaptation in 2016 but babel system\",\"authors\":\"M. Karafiát, M. Baskar, P. Matejka, Karel Veselý, F. Grézl, J. Černocký\",\"doi\":\"10.1109/SLT.2016.7846330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper provides an extensive summary of BUT 2016 system for the last IARPA Babel evaluations. It concentrates on multi-lingual training of both deep neural network (DNN)-based feature extraction and acoustic models including multilingual training of bidirectional Long Short Term memory networks. Next, two low-dimensional vector approaches to speaker adaptation are investigated: i-vectors and sequence-summarizing neural networks (SSNN). The results provided on three Babel Year 4 languages show clear advantage of both approaches in case limited amount of training data is available. The time necessary for the development of a new system is addressed too, as some of the investigated techniques do not require extensive re-training of the whole system.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

本文对上届IARPA巴别塔评估的BUT 2016系统进行了广泛的总结。主要研究基于深度神经网络(DNN)的特征提取和声学模型的多语言训练，包括双向长短期记忆网络的多语言训练。接下来，研究了两种低维向量方法:i向量和序列汇总神经网络(SSNN)。对三种Babel四年级语文的结果表明，在训练数据有限的情况下，这两种方法都有明显的优势。开发新系统所需的时间也得到了解决，因为所研究的一些技术不需要对整个系统进行广泛的重新培训。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multilingual BLSTM and speaker-specific vector adaptation in 2016 but babel system

This paper provides an extensive summary of BUT 2016 system for the last IARPA Babel evaluations. It concentrates on multi-lingual training of both deep neural network (DNN)-based feature extraction and acoustic models including multilingual training of bidirectional Long Short Term memory networks. Next, two low-dimensional vector approaches to speaker adaptation are investigated: i-vectors and sequence-summarizing neural networks (SSNN). The results provided on three Babel Year 4 languages show clear advantage of both approaches in case limited amount of training data is available. The time necessary for the development of a new system is addressed too, as some of the investigated techniques do not require extensive re-training of the whole system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量