Speaker dependent visual word recognition by using sequential mouth shape codes

2012 International Symposium on Intelligent Signal Processing and Communications Systems Pub Date : 2012-11-01 DOI:10.1109/ISPACS.2012.6473460

Takuro Tasaka, N. Hamada

{"title":"Speaker dependent visual word recognition by using sequential mouth shape codes","authors":"Takuro Tasaka, N. Hamada","doi":"10.1109/ISPACS.2012.6473460","DOIUrl":null,"url":null,"abstract":"Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker's visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.","PeriodicalId":158744,"journal":{"name":"2012 International Symposium on Intelligent Signal Processing and Communications Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Symposium on Intelligent Signal Processing and Communications Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS.2012.6473460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker's visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.

查看原文本刊更多论文

基于顺序口型代码的说话人依赖视觉词识别

视觉语音识别或唇读是一种将说话人的视觉线索添加到音频信息中的抗噪声语音识别方法。视觉语音识别基本上适用于说话人的验证和支持说话障碍者的多媒体界面。顺序口型编码法是一种有效的唇读方法，它利用了两种不同的口型，即第一个和最后一个口型，间歇性地出现。该方法的一个优点是其低计算负担的学习和单词注册过程。本文提出了一种通过检测和确定初始口型编码来识别辅音的唇形识别系统。所提出的方法最终能够区分由相同顺序元音代码组成的不同单词，尽管包含不同的辅音代码。实验结果表明，该系统具有较高的识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 International Symposium on Intelligent Signal Processing and Communications Systems

自引率

0.00%

发文量