{"title":"基于hmm的视觉语音识别系统的唇部特征提取与约简","authors":"S. Alizadeh, R. Boostani, V. Asadpour","doi":"10.1109/ICOSP.2008.4697195","DOIUrl":null,"url":null,"abstract":"Lipreading is a main part of audio-visual speech recognition systems which are mostly faced with redundancy of extracted features. In this paper, a new approach has been proposed to increase the lipreading performance by extraction of discriminant features. In this way, first, faces are detected; then, lip key points are extracted in which four cubic curves characterize lip contours. Next, the visual features are extracted from the contours for each frame. To discriminate each speech unit (word) from others, features of that speech unit frames are arranged in a feature vector. Moreover, differences of each frame features from k previous frame features are used to construct more informative feature vectors. To solve the small sample size problem, direct linear discriminant analysis (D-LDA) is employed to reduce the feature size. To classify these transformed features, hidden Markov model (HMM) is used to recognize the speech units. The proposed algorithm was applied on M2VTS database. Results show that applying of D-LDA for feature reduction provides the better classification accuracy compare to employ HMM without feature reduction.","PeriodicalId":445699,"journal":{"name":"2008 9th International Conference on Signal Processing","volume":"524 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Lip feature extraction and reduction for HMM-based visual speech recognition systems\",\"authors\":\"S. Alizadeh, R. Boostani, V. Asadpour\",\"doi\":\"10.1109/ICOSP.2008.4697195\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lipreading is a main part of audio-visual speech recognition systems which are mostly faced with redundancy of extracted features. In this paper, a new approach has been proposed to increase the lipreading performance by extraction of discriminant features. In this way, first, faces are detected; then, lip key points are extracted in which four cubic curves characterize lip contours. Next, the visual features are extracted from the contours for each frame. To discriminate each speech unit (word) from others, features of that speech unit frames are arranged in a feature vector. Moreover, differences of each frame features from k previous frame features are used to construct more informative feature vectors. To solve the small sample size problem, direct linear discriminant analysis (D-LDA) is employed to reduce the feature size. To classify these transformed features, hidden Markov model (HMM) is used to recognize the speech units. The proposed algorithm was applied on M2VTS database. Results show that applying of D-LDA for feature reduction provides the better classification accuracy compare to employ HMM without feature reduction.\",\"PeriodicalId\":445699,\"journal\":{\"name\":\"2008 9th International Conference on Signal Processing\",\"volume\":\"524 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 9th International Conference on Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOSP.2008.4697195\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 9th International Conference on Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSP.2008.4697195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Lip feature extraction and reduction for HMM-based visual speech recognition systems
Lipreading is a main part of audio-visual speech recognition systems which are mostly faced with redundancy of extracted features. In this paper, a new approach has been proposed to increase the lipreading performance by extraction of discriminant features. In this way, first, faces are detected; then, lip key points are extracted in which four cubic curves characterize lip contours. Next, the visual features are extracted from the contours for each frame. To discriminate each speech unit (word) from others, features of that speech unit frames are arranged in a feature vector. Moreover, differences of each frame features from k previous frame features are used to construct more informative feature vectors. To solve the small sample size problem, direct linear discriminant analysis (D-LDA) is employed to reduce the feature size. To classify these transformed features, hidden Markov model (HMM) is used to recognize the speech units. The proposed algorithm was applied on M2VTS database. Results show that applying of D-LDA for feature reduction provides the better classification accuracy compare to employ HMM without feature reduction.