S. Mahiba, S. Christina, P. Vijayalakshmi, T. Nagarajan
{"title":"基于MAP和MLLR的语音识别系统跨性别自适应分析","authors":"S. Mahiba, S. Christina, P. Vijayalakshmi, T. Nagarajan","doi":"10.1109/ICRTIT.2013.6844235","DOIUrl":null,"url":null,"abstract":"Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.","PeriodicalId":113531,"journal":{"name":"2013 International Conference on Recent Trends in Information Technology (ICRTIT)","volume":"24 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of cross-gender adaptation using MAP and MLLR in speech recognition systems\",\"authors\":\"S. Mahiba, S. Christina, P. Vijayalakshmi, T. Nagarajan\",\"doi\":\"10.1109/ICRTIT.2013.6844235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.\",\"PeriodicalId\":113531,\"journal\":{\"name\":\"2013 International Conference on Recent Trends in Information Technology (ICRTIT)\",\"volume\":\"24 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Recent Trends in Information Technology (ICRTIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2013.6844235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Recent Trends in Information Technology (ICRTIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2013.6844235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of cross-gender adaptation using MAP and MLLR in speech recognition systems
Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.