{"title":"Multi-level adaptive network for accented Mandarin speech recognition","authors":"Huiyong Wang, Lan Wang, Xunying Liu","doi":"10.1109/ICIST.2014.6920550","DOIUrl":null,"url":null,"abstract":"Accented speech recognition is more challenging than standard speech recognition due to acoustic and linguistic mismatch between standard and accented data. In this paper, we propose a new framework combining Tandem system to improve the discriminative ability of acoustic features with Multi-level Adaptive Network (MLAN) to incorporate information from standard Mandarin corpus and also to solve the data sparseness problem. Mandarin spoken by Guangzhou speakers is considered as the accented mandarin (accented Putonghua, A-PTH), while spoken by northern area as the standard mandarin (standard Putonghua, S-PTH). Significant character error rate reduction of 13.8% and 24.6% relative are obtained over the baseline GMM-HMM systems trained on mixed corpus including both A-PTH and S-PTH corpus, as well as only the A-PTH corpus respectively.","PeriodicalId":306383,"journal":{"name":"2014 4th IEEE International Conference on Information Science and Technology","volume":"1995 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th IEEE International Conference on Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST.2014.6920550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Accented speech recognition is more challenging than standard speech recognition due to acoustic and linguistic mismatch between standard and accented data. In this paper, we propose a new framework combining Tandem system to improve the discriminative ability of acoustic features with Multi-level Adaptive Network (MLAN) to incorporate information from standard Mandarin corpus and also to solve the data sparseness problem. Mandarin spoken by Guangzhou speakers is considered as the accented mandarin (accented Putonghua, A-PTH), while spoken by northern area as the standard mandarin (standard Putonghua, S-PTH). Significant character error rate reduction of 13.8% and 24.6% relative are obtained over the baseline GMM-HMM systems trained on mixed corpus including both A-PTH and S-PTH corpus, as well as only the A-PTH corpus respectively.