{"title":"Speaker dependent visual word recognition by using sequential mouth shape codes","authors":"Takuro Tasaka, N. Hamada","doi":"10.1109/ISPACS.2012.6473460","DOIUrl":null,"url":null,"abstract":"Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker's visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.","PeriodicalId":158744,"journal":{"name":"2012 International Symposium on Intelligent Signal Processing and Communications Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Symposium on Intelligent Signal Processing and Communications Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2012.6473460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker's visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.