视觉语音识别的隐藏条件随机场

2009 13th International Machine Vision and Image Processing Conference Pub Date : 2009-09-02 DOI:10.1109/IMVIP.2009.28

Adrian Pass, Jianguo Zhang, D. Stewart

{"title":"视觉语音识别的隐藏条件随机场","authors":"Adrian Pass, Jianguo Zhang, D. Stewart","doi":"10.1109/IMVIP.2009.28","DOIUrl":null,"url":null,"abstract":"In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.","PeriodicalId":179564,"journal":{"name":"2009 13th International Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Hidden Conditional Random Fields for Visual Speech Recognition\",\"authors\":\"Adrian Pass, Jianguo Zhang, D. Stewart\",\"doi\":\"10.1109/IMVIP.2009.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.\",\"PeriodicalId\":179564,\"journal\":{\"name\":\"2009 13th International Machine Vision and Image Processing Conference\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 13th International Machine Vision and Image Processing Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMVIP.2009.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 13th International Machine Vision and Image Processing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMVIP.2009.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本文中，我们提出了隐藏条件随机场(HCRFs)在视觉语音识别语音建模中的应用。HCRFs可以很容易地适应于对一个观测序列的长期依赖关系进行建模。因此，由于模型能够采用更多的上下文方法来生成状态序列，因此视觉词识别性能可以得到改善。通过与基线HMM系统的比较，给出了依赖于说话人的孤立数字视觉语音识别任务的结果。我们首先说明，使用HCRFs可以通过增加每个状态考虑的过去和未来观察的数量来提高干净视频的单词识别率。其次，我们比较了在测试集上使用不同级别视频压缩的模型性能。据我们所知，这是第一次尝试使用HCRFs进行视觉语音识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hidden Conditional Random Fields for Visual Speech Recognition

In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 13th International Machine Vision and Image Processing Conference

自引率

0.00%

发文量