{"title":"子空间快速说话人自适应研究","authors":"Michael Zhang, Jun Xu","doi":"10.1109/CHINSL.2004.1409639","DOIUrl":null,"url":null,"abstract":"Speaker adaptation is an essential part of any state-of-the-art automatic speech recognizer (ASR). Recently, more and more application requirements have appeared for embedded ASR. For these cases, a more compact speech model, subspace distribution clustering hidden Markov model (SDCHMM) is used instead of continuous density hidden Markov model (CDHMM). In previous studies on SDCHMM adaptation, the subspace Gaussian pools of SDCHMM are the parameters to be adjusted for speaker variations. Alternatively, we try to employ the link table parameters of SDCHMM, which defines the tying structure in subspaces, to model the inter-speaker mismatch, with the Gaussian parameters maintained. Since the variation range for the parameters is highly limited, this method is potentially faster than conventional Gaussian pools adaptation. A comparative study on a continuous digital dialing (CDD) task shows that when data is seriously insufficient, link table adaptation is more effective than conventional methods, with 17% relative improvement in utterance accuracy rate, compared to 14% improvement by previous Gaussian adaptation. However, further improvement with more data is limited. When data size is doubled, this method gave 21% improvement, compared to 30% improvement by the conventional method.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An investigation into subspace rapid speaker adaptation\",\"authors\":\"Michael Zhang, Jun Xu\",\"doi\":\"10.1109/CHINSL.2004.1409639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker adaptation is an essential part of any state-of-the-art automatic speech recognizer (ASR). Recently, more and more application requirements have appeared for embedded ASR. For these cases, a more compact speech model, subspace distribution clustering hidden Markov model (SDCHMM) is used instead of continuous density hidden Markov model (CDHMM). In previous studies on SDCHMM adaptation, the subspace Gaussian pools of SDCHMM are the parameters to be adjusted for speaker variations. Alternatively, we try to employ the link table parameters of SDCHMM, which defines the tying structure in subspaces, to model the inter-speaker mismatch, with the Gaussian parameters maintained. Since the variation range for the parameters is highly limited, this method is potentially faster than conventional Gaussian pools adaptation. A comparative study on a continuous digital dialing (CDD) task shows that when data is seriously insufficient, link table adaptation is more effective than conventional methods, with 17% relative improvement in utterance accuracy rate, compared to 14% improvement by previous Gaussian adaptation. However, further improvement with more data is limited. When data size is doubled, this method gave 21% improvement, compared to 30% improvement by the conventional method.\",\"PeriodicalId\":212562,\"journal\":{\"name\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2004.1409639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An investigation into subspace rapid speaker adaptation
Speaker adaptation is an essential part of any state-of-the-art automatic speech recognizer (ASR). Recently, more and more application requirements have appeared for embedded ASR. For these cases, a more compact speech model, subspace distribution clustering hidden Markov model (SDCHMM) is used instead of continuous density hidden Markov model (CDHMM). In previous studies on SDCHMM adaptation, the subspace Gaussian pools of SDCHMM are the parameters to be adjusted for speaker variations. Alternatively, we try to employ the link table parameters of SDCHMM, which defines the tying structure in subspaces, to model the inter-speaker mismatch, with the Gaussian parameters maintained. Since the variation range for the parameters is highly limited, this method is potentially faster than conventional Gaussian pools adaptation. A comparative study on a continuous digital dialing (CDD) task shows that when data is seriously insufficient, link table adaptation is more effective than conventional methods, with 17% relative improvement in utterance accuracy rate, compared to 14% improvement by previous Gaussian adaptation. However, further improvement with more data is limited. When data size is doubled, this method gave 21% improvement, compared to 30% improvement by the conventional method.