A Visual Speech Feature to Indentify the Speaking States from Video

2010 International Conference on Multimedia Technology Pub Date : 2010-11-11 DOI:10.1109/ICMULT.2010.5629829

Xibin Jia, Baocai Yin, Yanfeng Sun

引用次数: 0

Abstract

The paper proposes a kind of visual speech feature for the speaking mouth images from the video combining clues of the shape and local teeth texture. The geometric feature we proposed based on the computing the Euclidian distant between each the feature point around the inner and outer lip. The local texture with G and B components as baseline is employed to calculate the color moment to describe the visibility of teeth. The weighted fusion is used to combine the two features. The k-mean algorithm is utilized to analyze the feature performance according to evaluate the clustering results. The results show that with G and B color component to derive the local texture to model the teeth visibility are better than the others and our feature has higher ability to perceive the visemes than the PCA and geometric feature only.

查看原文本刊更多论文

从视频中识别说话状态的视觉语音特征

本文提出了一种结合形状线索和局部牙齿纹理线索的视频说话嘴图像视觉语音特征。我们提出的几何特征是基于计算内外唇周围每个特征点之间的欧几里德距离。采用以G和B分量为基线的局部纹理计算颜色矩来描述牙齿的可见性。采用加权融合的方法将两个特征结合起来。利用k-mean算法对聚类结果进行评价，分析特征性能。结果表明，用G和B颜色分量提取局部纹理来建模牙齿可见性的效果优于其他方法，并且我们的特征对牙齿可见性的感知能力优于单纯的主成分分析和几何特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 International Conference on Multimedia Technology

自引率

0.00%

发文量