{"title":"基于卷积VEF蛇和典型相关的视觉语音识别","authors":"Kun Lu, Yuwei Wu, Yunde Jia","doi":"10.1109/YCICT.2010.5713068","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach for automatic visual speech recognition using Convolutional VEF snake and canonical correlations. The utterance image sequences of isolated Chinese words are recorded with a head-mounted camera, and we use Convolutional VEF snake model to detect and track lip boundary rapidly and accurately. Geometric and motion features are both extracted from lip contour sequences and concatenated to form a joint feature descriptor. Canonical correlation is applied to measure the similarity of two utterance feature matrices and a linear discriminant function is introduced to make further improvement on the recognition accuracy. Experimental results demonstrate that our approach is promising and the joint feature descriptor is more robust than individual ones.","PeriodicalId":179847,"journal":{"name":"2010 IEEE Youth Conference on Information, Computing and Telecommunications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Visual speech recognition using Convolutional VEF snake and canonical correlations\",\"authors\":\"Kun Lu, Yuwei Wu, Yunde Jia\",\"doi\":\"10.1109/YCICT.2010.5713068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel approach for automatic visual speech recognition using Convolutional VEF snake and canonical correlations. The utterance image sequences of isolated Chinese words are recorded with a head-mounted camera, and we use Convolutional VEF snake model to detect and track lip boundary rapidly and accurately. Geometric and motion features are both extracted from lip contour sequences and concatenated to form a joint feature descriptor. Canonical correlation is applied to measure the similarity of two utterance feature matrices and a linear discriminant function is introduced to make further improvement on the recognition accuracy. Experimental results demonstrate that our approach is promising and the joint feature descriptor is more robust than individual ones.\",\"PeriodicalId\":179847,\"journal\":{\"name\":\"2010 IEEE Youth Conference on Information, Computing and Telecommunications\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Youth Conference on Information, Computing and Telecommunications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/YCICT.2010.5713068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Youth Conference on Information, Computing and Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/YCICT.2010.5713068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Visual speech recognition using Convolutional VEF snake and canonical correlations
This paper presents a novel approach for automatic visual speech recognition using Convolutional VEF snake and canonical correlations. The utterance image sequences of isolated Chinese words are recorded with a head-mounted camera, and we use Convolutional VEF snake model to detect and track lip boundary rapidly and accurately. Geometric and motion features are both extracted from lip contour sequences and concatenated to form a joint feature descriptor. Canonical correlation is applied to measure the similarity of two utterance feature matrices and a linear discriminant function is introduced to make further improvement on the recognition accuracy. Experimental results demonstrate that our approach is promising and the joint feature descriptor is more robust than individual ones.