DNN声学模型分解隐层自适应的低秩基

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846332

Lahiru Samarakoon, K. Sim

{"title":"DNN声学模型分解隐层自适应的低秩基","authors":"Lahiru Samarakoon, K. Sim","doi":"10.1109/SLT.2016.7846332","DOIUrl":null,"url":null,"abstract":"Recently, the factorized hidden layer (FHL) adaptation method is proposed for speaker adaptation of deep neural network (DNN) acoustic models. An FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors, in addition to the standard affine transformation. On the other hand, full-rank bases are used with a similar DNN adaptation method which is based on cluster adaptive training (CAT). Therefore, it is interesting to investigate the effect of the rank of the bases used for adaptation. The increase of the rank of the bases improves the speaker subspace representation, without increasing the number of learnable speaker parameters. In this work, we investigate the effect of using various ranks for the bases of the SD transformation of FHLs on Aurora 4, AMI IHM and AMI SDM tasks. Experimental results have shown that when one FHL layer is used, it is optimal to use low-ranked bases of rank-50, instead of full-rank bases. Furthermore, when multiple FHLs are used, rank-1 bases are sufficient.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Low-rank bases for factorized hidden layer adaptation of DNN acoustic models\",\"authors\":\"Lahiru Samarakoon, K. Sim\",\"doi\":\"10.1109/SLT.2016.7846332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, the factorized hidden layer (FHL) adaptation method is proposed for speaker adaptation of deep neural network (DNN) acoustic models. An FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors, in addition to the standard affine transformation. On the other hand, full-rank bases are used with a similar DNN adaptation method which is based on cluster adaptive training (CAT). Therefore, it is interesting to investigate the effect of the rank of the bases used for adaptation. The increase of the rank of the bases improves the speaker subspace representation, without increasing the number of learnable speaker parameters. In this work, we investigate the effect of using various ranks for the bases of the SD transformation of FHLs on Aurora 4, AMI IHM and AMI SDM tasks. Experimental results have shown that when one FHL layer is used, it is optimal to use low-ranked bases of rank-50, instead of full-rank bases. Furthermore, when multiple FHLs are used, rank-1 bases are sufficient.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

近年来，针对深度神经网络(DNN)声学模型的说话人自适应，提出了分解隐藏层(FHL)自适应方法。除了标准仿射变换之外，FHL还包含一个使用秩1矩阵线性组合的扬声器相关(SD)变换矩阵和一个使用向量线性组合的SD偏置。另一方面，将全秩基与基于聚类自适应训练(CAT)的DNN自适应方法结合使用。因此，研究用于适应的碱基等级的影响是一个有趣的问题。在不增加可学习的说话人参数数量的情况下，基秩的增加改善了说话人子空间的表示。在这项工作中，我们研究了在Aurora 4、AMI IHM和AMI SDM任务中使用不同等级的FHLs的SD转换碱基的影响。实验结果表明，当使用一个FHL层时，使用rank-50的低秩碱基比使用全秩碱基更优。此外，当使用多个fhl时，排名1的碱基就足够了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Low-rank bases for factorized hidden layer adaptation of DNN acoustic models

Recently, the factorized hidden layer (FHL) adaptation method is proposed for speaker adaptation of deep neural network (DNN) acoustic models. An FHL contains a speaker-dependent (SD) transformation matrix using a linear combination of rank-1 matrices and an SD bias using a linear combination of vectors, in addition to the standard affine transformation. On the other hand, full-rank bases are used with a similar DNN adaptation method which is based on cluster adaptive training (CAT). Therefore, it is interesting to investigate the effect of the rank of the bases used for adaptation. The increase of the rank of the bases improves the speaker subspace representation, without increasing the number of learnable speaker parameters. In this work, we investigate the effect of using various ranks for the bases of the SD transformation of FHLs on Aurora 4, AMI IHM and AMI SDM tasks. Experimental results have shown that when one FHL layer is used, it is optimal to use low-ranked bases of rank-50, instead of full-rank bases. Furthermore, when multiple FHLs are used, rank-1 bases are sufficient.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量