Lip-based visual speech recognition system

2015 International Carnahan Conference on Security Technology (ICCST) Pub Date : 2015-09-01 DOI:10.1109/CCST.2015.7389703

A. Frisky, Chien-Yao Wang, A. Santoso, Jia-Ching Wang

{"title":"Lip-based visual speech recognition system","authors":"A. Frisky, Chien-Yao Wang, A. Santoso, Jia-Ching Wang","doi":"10.1109/CCST.2015.7389703","DOIUrl":null,"url":null,"abstract":"This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.","PeriodicalId":292743,"journal":{"name":"2015 International Carnahan Conference on Security Technology (ICCST)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Carnahan Conference on Security Technology (ICCST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCST.2015.7389703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.

查看原文本刊更多论文

基于嘴唇的视觉语音识别系统

本文提出了一个解决视觉语音识别问题的系统。该系统基于视觉唇动识别，应用视频内容分析技术。利用时空特征描述符，从包含视觉唇形信息的视频中提取特征。在视频的每一帧中，通过去除噪声和增强图像对比度来进行预处理。提取的特征用于在分类步骤中为核稀疏表示分类器(K-SRC)构建字典。采用非负矩阵分解(NMF)方法对提取的特征进行降维处理。我们使用AVLetters和AVLetters2数据集评估了系统的性能。为了评估我们系统的性能，我们使用了与之前工作相同的配置。使用AVLetters数据集，在半依赖、独立和依赖三种情况下，准确率分别达到67.13%、45.37%和63.12%。使用AVLetters2数据集，我们的方法在说话人依赖情况下的准确率为89.02%，在说话人独立情况下的准确率为25.9%。结果表明，本文提出的方法优于使用相同配置的其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Carnahan Conference on Security Technology (ICCST)

自引率

0.00%

发文量