M. Tariquzzaman, Song-Min Gyu, Kim Jin Young, Na Seung You, M. A. Rashid
{"title":"基于最优可靠性融合的视听语音识别性能改进","authors":"M. Tariquzzaman, Song-Min Gyu, Kim Jin Young, Na Seung You, M. A. Rashid","doi":"10.1109/ICICIS.2011.58","DOIUrl":null,"url":null,"abstract":"In state-of-the-art ASR technology, audio and video (AV) information based speech recognition is one of key challenges to cope with noise problem. AV fusion is one of the robust approaches for ASR. The main issues of AV fusion is where and how to integrate the two modalities' information. To enhance the AV fusion performance the paper [1] has proposed the optimum reliability fusion (ORF) and applied the ORF to AV speaker identification. In this paper we adopt the ORF based fusion in AV based speech recognition and evaluate the performance improvement in that domain. The ORF's main idea is to introduce weighting factors in score-base reliability measure (SCRM) for solving the over- or under-estimation problem in SCRM calculation. Our AV speech recognition system is implemented for Korean digit recognition using SAMSUMG AV database. Experimental results show that ORF effectively reduce the relative error rate of 42.8% in comparison with the baseline system adopt the previous AV fusion scheme [2].","PeriodicalId":255291,"journal":{"name":"2011 International Conference on Internet Computing and Information Services","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion\",\"authors\":\"M. Tariquzzaman, Song-Min Gyu, Kim Jin Young, Na Seung You, M. A. Rashid\",\"doi\":\"10.1109/ICICIS.2011.58\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In state-of-the-art ASR technology, audio and video (AV) information based speech recognition is one of key challenges to cope with noise problem. AV fusion is one of the robust approaches for ASR. The main issues of AV fusion is where and how to integrate the two modalities' information. To enhance the AV fusion performance the paper [1] has proposed the optimum reliability fusion (ORF) and applied the ORF to AV speaker identification. In this paper we adopt the ORF based fusion in AV based speech recognition and evaluate the performance improvement in that domain. The ORF's main idea is to introduce weighting factors in score-base reliability measure (SCRM) for solving the over- or under-estimation problem in SCRM calculation. Our AV speech recognition system is implemented for Korean digit recognition using SAMSUMG AV database. Experimental results show that ORF effectively reduce the relative error rate of 42.8% in comparison with the baseline system adopt the previous AV fusion scheme [2].\",\"PeriodicalId\":255291,\"journal\":{\"name\":\"2011 International Conference on Internet Computing and Information Services\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Internet Computing and Information Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIS.2011.58\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Internet Computing and Information Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIS.2011.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion
In state-of-the-art ASR technology, audio and video (AV) information based speech recognition is one of key challenges to cope with noise problem. AV fusion is one of the robust approaches for ASR. The main issues of AV fusion is where and how to integrate the two modalities' information. To enhance the AV fusion performance the paper [1] has proposed the optimum reliability fusion (ORF) and applied the ORF to AV speaker identification. In this paper we adopt the ORF based fusion in AV based speech recognition and evaluate the performance improvement in that domain. The ORF's main idea is to introduce weighting factors in score-base reliability measure (SCRM) for solving the over- or under-estimation problem in SCRM calculation. Our AV speech recognition system is implemented for Korean digit recognition using SAMSUMG AV database. Experimental results show that ORF effectively reduce the relative error rate of 42.8% in comparison with the baseline system adopt the previous AV fusion scheme [2].