{"title":"Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR","authors":"P. Swietojanski, Arnab Ghoshal, S. Renals","doi":"10.1109/SLT.2012.6424230","DOIUrl":null,"url":null,"abstract":"We investigate the use of cross-lingual acoustic data to initialise deep neural network (DNN) acoustic models by means of unsupervised restricted Boltzmann machine (RBM) pre-training. DNNs for German are pretrained using one or all of German, Portuguese, Spanish and Swedish. The DNNs are used in a tandem configuration, where the network outputs are used as features for a hidden Markov model (HMM) whose emission densities are modeled by Gaussian mixture models (GMMs), as well as in a hybrid configuration, where the network outputs are used as the HMM state likelihoods. The experiments show that unsupervised pretraining is more crucial for the hybrid setups, particularly with limited amounts of transcribed training data. More importantly, unsupervised pretraining is shown to be language-independent.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"146","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 146
Abstract
We investigate the use of cross-lingual acoustic data to initialise deep neural network (DNN) acoustic models by means of unsupervised restricted Boltzmann machine (RBM) pre-training. DNNs for German are pretrained using one or all of German, Portuguese, Spanish and Swedish. The DNNs are used in a tandem configuration, where the network outputs are used as features for a hidden Markov model (HMM) whose emission densities are modeled by Gaussian mixture models (GMMs), as well as in a hybrid configuration, where the network outputs are used as the HMM state likelihoods. The experiments show that unsupervised pretraining is more crucial for the hybrid setups, particularly with limited amounts of transcribed training data. More importantly, unsupervised pretraining is shown to be language-independent.