Aditya Raikar, Meet H. Soni, Ashish Panda, S. Kopparapu
{"title":"Acoustic Model Adaptation In Reverberant Conditions Using Multi-task Learned Embeddings","authors":"Aditya Raikar, Meet H. Soni, Ashish Panda, S. Kopparapu","doi":"10.23919/eusipco55093.2022.9909579","DOIUrl":null,"url":null,"abstract":"Acoustic environment plays a major role in the performance of a large-scale Automatic Speech Recognition (ASR) system. It becomes a lot more challenging when substantial amount of distortions, such as background noise and reverberations are present. Of late, it has been standard to use i-vectors for Acoustic Model (AM) adaptation. Embeddings from Single Task Learned (STL) neural network systems, such as x-vectors and r-vectors, have also been used to a varying degree of success. This paper proposes the use of Multi Task Learned (MTL) embeddings for large vocabulary hybrid acoustic model adaptation in reverberant environments. MTL embeddings are extracted from an affine layer of the deep neural network trained on multiple tasks such as speaker information and room information. Our experiments show that the proposed MTL embeddings outperform i-vectors, x-vectors and r-vectors for AM adaptation in reverberant conditions. Besides, it has been demonstrated that the proposed MTL-embeddings can be fused with i-vectors to provide further improvement. We provide results on artificially reverberated Librispeech data as well as real world reverberated HRRE data. On Librispeech database, the proposed method provides an improvement of 10.9% and 8.7% relative to i-vector in reverberated test-clean and test-other data respectively, while an improvement of 7% is observed relative to i-vector when the proposed system is tested on HRRE dataset.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Acoustic environment plays a major role in the performance of a large-scale Automatic Speech Recognition (ASR) system. It becomes a lot more challenging when substantial amount of distortions, such as background noise and reverberations are present. Of late, it has been standard to use i-vectors for Acoustic Model (AM) adaptation. Embeddings from Single Task Learned (STL) neural network systems, such as x-vectors and r-vectors, have also been used to a varying degree of success. This paper proposes the use of Multi Task Learned (MTL) embeddings for large vocabulary hybrid acoustic model adaptation in reverberant environments. MTL embeddings are extracted from an affine layer of the deep neural network trained on multiple tasks such as speaker information and room information. Our experiments show that the proposed MTL embeddings outperform i-vectors, x-vectors and r-vectors for AM adaptation in reverberant conditions. Besides, it has been demonstrated that the proposed MTL-embeddings can be fused with i-vectors to provide further improvement. We provide results on artificially reverberated Librispeech data as well as real world reverberated HRRE data. On Librispeech database, the proposed method provides an improvement of 10.9% and 8.7% relative to i-vector in reverberated test-clean and test-other data respectively, while an improvement of 7% is observed relative to i-vector when the proposed system is tested on HRRE dataset.