基于MAP和MLLR的语音识别系统跨性别自适应分析

2013 International Conference on Recent Trends in Information Technology (ICRTIT) Pub Date : 2013-07-25 DOI:10.1109/ICRTIT.2013.6844235

S. Mahiba, S. Christina, P. Vijayalakshmi, T. Nagarajan

{"title":"基于MAP和MLLR的语音识别系统跨性别自适应分析","authors":"S. Mahiba, S. Christina, P. Vijayalakshmi, T. Nagarajan","doi":"10.1109/ICRTIT.2013.6844235","DOIUrl":null,"url":null,"abstract":"Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.","PeriodicalId":113531,"journal":{"name":"2013 International Conference on Recent Trends in Information Technology (ICRTIT)","volume":"24 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of cross-gender adaptation using MAP and MLLR in speech recognition systems\",\"authors\":\"S. Mahiba, S. Christina, P. Vijayalakshmi, T. Nagarajan\",\"doi\":\"10.1109/ICRTIT.2013.6844235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.\",\"PeriodicalId\":113531,\"journal\":{\"name\":\"2013 International Conference on Recent Trends in Information Technology (ICRTIT)\",\"volume\":\"24 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Recent Trends in Information Technology (ICRTIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2013.6844235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Recent Trends in Information Technology (ICRTIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2013.6844235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

使用上下文相关音素开发的语音识别系统捕获了协同发音效应，与使用上下文无关音素开发的系统相比，它具有更好的性能。然而，系统的性能也取决于扬声器。识别系统的说话人依赖性来源于说话人相关的语音特征。声道长度的变化和!形状是造成说话人之间差异的主要原因。因此，独立扬声器(SI)系统的性能被依赖扬声器(SD)系统所超越。文献已经证实，通过说话人自适应，SI系统的识别性能可以提高到SD系统的标准。本文的研究重点是分析跨性别说话人自适应在男女训练数据的数量和比例方面表现较好。利用TIMIT语音语料库实现了说话人自适应技术MAP和MLLR。在24s的自适应数据下，MLLR对模型参数的自适应优于MAP。我们还推断，与仅使用男性或女性数据训练的系统相比，同时使用男性和女性数据训练的系统具有更好的跨性别适应性能，这主要是因为男性和女性说话者的系统参数差异很大。在最小的数据量下，上下文相关系统的整体识别性能在MAP适应下比未适应的识别系统提高了0.55%，在MLLR适应下提高了2.75%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analysis of cross-gender adaptation using MAP and MLLR in speech recognition systems

Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Conference on Recent Trends in Information Technology (ICRTIT)

自引率

0.00%

发文量