Lip-based visual speech recognition system

A. Frisky, Chien-Yao Wang, A. Santoso, Jia-Ching Wang
{"title":"Lip-based visual speech recognition system","authors":"A. Frisky, Chien-Yao Wang, A. Santoso, Jia-Ching Wang","doi":"10.1109/CCST.2015.7389703","DOIUrl":null,"url":null,"abstract":"This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.","PeriodicalId":292743,"journal":{"name":"2015 International Carnahan Conference on Security Technology (ICCST)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Carnahan Conference on Security Technology (ICCST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCST.2015.7389703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.
基于嘴唇的视觉语音识别系统
本文提出了一个解决视觉语音识别问题的系统。该系统基于视觉唇动识别,应用视频内容分析技术。利用时空特征描述符,从包含视觉唇形信息的视频中提取特征。在视频的每一帧中,通过去除噪声和增强图像对比度来进行预处理。提取的特征用于在分类步骤中为核稀疏表示分类器(K-SRC)构建字典。采用非负矩阵分解(NMF)方法对提取的特征进行降维处理。我们使用AVLetters和AVLetters2数据集评估了系统的性能。为了评估我们系统的性能,我们使用了与之前工作相同的配置。使用AVLetters数据集,在半依赖、独立和依赖三种情况下,准确率分别达到67.13%、45.37%和63.12%。使用AVLetters2数据集,我们的方法在说话人依赖情况下的准确率为89.02%,在说话人独立情况下的准确率为25.9%。结果表明,本文提出的方法优于使用相同配置的其他方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信