利用小波变换和基于矩的特征进行视觉语音识别

ICINCO-RA Pub Date : 2016-11-30 DOI:10.5220/0001210203660371

W. C. Yau, D. Kumar, S. Arjunan, Sanjay Kumar

{"title":"利用小波变换和基于矩的特征进行视觉语音识别","authors":"W. C. Yau, D. Kumar, S. Arjunan, Sanjay Kumar","doi":"10.5220/0001210203660371","DOIUrl":null,"url":null,"abstract":"This paper presents a novel vision based approach to identify utterances consisting of consonants. A view based method is adopted to represent the 3-D image sequence of the mouth movement in a 2-D space using grayscale images named as motion history image (MHI). MHI is produced by applying accumulative image differencing technique on the sequence of images to implicitly capture the temporal information of the mouth movement. The proposed technique combines Discrete Stationary Wavelet Transform (SWT) and image moments to classify the MHI. A 2-D SWT at level 1 is applied to decompose MHI to produce one approximate and three detail sub images. The paper reports on the testing of the classification accuracy of three different moment-based features, namely Zernike moments, geometric moments and Hu moments computed from the approximate representation of MHI. Supervised feed forward multilayer perceptron (MLP) type artificial neural network (ANN) with back propagation learning algorithm is used to classify the moment-based features. The performance and image representation ability of the three moments features are compared in this paper. The preliminary results show that all these moments can achieve high recognition rate in classification of 3 consonants.","PeriodicalId":302311,"journal":{"name":"ICINCO-RA","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Visual speech recognition using wavelet transform and moment based features\",\"authors\":\"W. C. Yau, D. Kumar, S. Arjunan, Sanjay Kumar\",\"doi\":\"10.5220/0001210203660371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel vision based approach to identify utterances consisting of consonants. A view based method is adopted to represent the 3-D image sequence of the mouth movement in a 2-D space using grayscale images named as motion history image (MHI). MHI is produced by applying accumulative image differencing technique on the sequence of images to implicitly capture the temporal information of the mouth movement. The proposed technique combines Discrete Stationary Wavelet Transform (SWT) and image moments to classify the MHI. A 2-D SWT at level 1 is applied to decompose MHI to produce one approximate and three detail sub images. The paper reports on the testing of the classification accuracy of three different moment-based features, namely Zernike moments, geometric moments and Hu moments computed from the approximate representation of MHI. Supervised feed forward multilayer perceptron (MLP) type artificial neural network (ANN) with back propagation learning algorithm is used to classify the moment-based features. The performance and image representation ability of the three moments features are compared in this paper. The preliminary results show that all these moments can achieve high recognition rate in classification of 3 consonants.\",\"PeriodicalId\":302311,\"journal\":{\"name\":\"ICINCO-RA\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICINCO-RA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0001210203660371\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICINCO-RA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0001210203660371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种基于视觉的新方法来识别由辅音组成的语音。本文采用了一种基于视图的方法，在二维空间中使用灰度图像来表示口腔运动的三维图像序列，并将其命名为运动历史图像（MHI）。MHI 是通过在图像序列上应用累积图像差分技术来生成的，从而隐含地捕捉嘴部运动的时间信息。所提出的技术结合了离散固定小波变换（SWT）和图像矩来对 MHI 进行分类。应用 1 级 2-D SWT 对 MHI 进行分解，生成一个近似和三个细节子图像。论文报告了对三种不同的基于矩的特征（即从 MHI 的近似表示计算出的 Zernike 矩、几何矩和 Hu 矩）的分类准确性进行的测试。采用反向传播学习算法的监督前馈多层感知器（MLP）型人工神经网络（ANN）对基于矩的特征进行分类。本文比较了三种矩特征的性能和图像表示能力。初步结果表明，所有这些矩都能在 3 个辅音的分类中达到较高的识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Visual speech recognition using wavelet transform and moment based features

This paper presents a novel vision based approach to identify utterances consisting of consonants. A view based method is adopted to represent the 3-D image sequence of the mouth movement in a 2-D space using grayscale images named as motion history image (MHI). MHI is produced by applying accumulative image differencing technique on the sequence of images to implicitly capture the temporal information of the mouth movement. The proposed technique combines Discrete Stationary Wavelet Transform (SWT) and image moments to classify the MHI. A 2-D SWT at level 1 is applied to decompose MHI to produce one approximate and three detail sub images. The paper reports on the testing of the classification accuracy of three different moment-based features, namely Zernike moments, geometric moments and Hu moments computed from the approximate representation of MHI. Supervised feed forward multilayer perceptron (MLP) type artificial neural network (ANN) with back propagation learning algorithm is used to classify the moment-based features. The performance and image representation ability of the three moments features are compared in this paper. The preliminary results show that all these moments can achieve high recognition rate in classification of 3 consonants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICINCO-RA

自引率

0.00%

发文量