Ji Xu, Zhen Zhang, Qingqing Zhang, Jielin Pan, Yonghong Yan
{"title":"用长时模式和扩展音位集改进朝鲜语LVCSR","authors":"Ji Xu, Zhen Zhang, Qingqing Zhang, Jielin Pan, Yonghong Yan","doi":"10.1109/GCIS.2013.60","DOIUrl":null,"url":null,"abstract":"Korean is an agglutinative language, in which pronunciations are affected by long-term context. In this paper, the long-time temporal information is investigated to improve Korean LVCSR. TRAP-based MLP features, which are able to utilize the scattered acoustic information over several hundred milliseconds, are employed to obtain additional information besides the conventional cepstral features. In contrast to the traditional Korean phoneme set, in which consonants in the initial and final positions are taken as the same, a more specific phoneme set is constructed via taking consonants as position dependent. In the Korean broadcast news speech recognition task, experiments show that with these improvements the character error rate has been reduced by 25.3% relatively over the baseline system.","PeriodicalId":366262,"journal":{"name":"2013 Fourth Global Congress on Intelligent Systems","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Korean LVCSR with Long-Time Temporal Patterns and an Extended Phoneme Set\",\"authors\":\"Ji Xu, Zhen Zhang, Qingqing Zhang, Jielin Pan, Yonghong Yan\",\"doi\":\"10.1109/GCIS.2013.60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Korean is an agglutinative language, in which pronunciations are affected by long-term context. In this paper, the long-time temporal information is investigated to improve Korean LVCSR. TRAP-based MLP features, which are able to utilize the scattered acoustic information over several hundred milliseconds, are employed to obtain additional information besides the conventional cepstral features. In contrast to the traditional Korean phoneme set, in which consonants in the initial and final positions are taken as the same, a more specific phoneme set is constructed via taking consonants as position dependent. In the Korean broadcast news speech recognition task, experiments show that with these improvements the character error rate has been reduced by 25.3% relatively over the baseline system.\",\"PeriodicalId\":366262,\"journal\":{\"name\":\"2013 Fourth Global Congress on Intelligent Systems\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Fourth Global Congress on Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GCIS.2013.60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Fourth Global Congress on Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCIS.2013.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Korean LVCSR with Long-Time Temporal Patterns and an Extended Phoneme Set
Korean is an agglutinative language, in which pronunciations are affected by long-term context. In this paper, the long-time temporal information is investigated to improve Korean LVCSR. TRAP-based MLP features, which are able to utilize the scattered acoustic information over several hundred milliseconds, are employed to obtain additional information besides the conventional cepstral features. In contrast to the traditional Korean phoneme set, in which consonants in the initial and final positions are taken as the same, a more specific phoneme set is constructed via taking consonants as position dependent. In the Korean broadcast news speech recognition task, experiments show that with these improvements the character error rate has been reduced by 25.3% relatively over the baseline system.