视觉语音识别的隐藏条件随机场

Adrian Pass, Jianguo Zhang, D. Stewart
{"title":"视觉语音识别的隐藏条件随机场","authors":"Adrian Pass, Jianguo Zhang, D. Stewart","doi":"10.1109/IMVIP.2009.28","DOIUrl":null,"url":null,"abstract":"In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.","PeriodicalId":179564,"journal":{"name":"2009 13th International Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Hidden Conditional Random Fields for Visual Speech Recognition\",\"authors\":\"Adrian Pass, Jianguo Zhang, D. Stewart\",\"doi\":\"10.1109/IMVIP.2009.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.\",\"PeriodicalId\":179564,\"journal\":{\"name\":\"2009 13th International Machine Vision and Image Processing Conference\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 13th International Machine Vision and Image Processing Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMVIP.2009.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 13th International Machine Vision and Image Processing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMVIP.2009.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在本文中,我们提出了隐藏条件随机场(HCRFs)在视觉语音识别语音建模中的应用。HCRFs可以很容易地适应于对一个观测序列的长期依赖关系进行建模。因此,由于模型能够采用更多的上下文方法来生成状态序列,因此视觉词识别性能可以得到改善。通过与基线HMM系统的比较,给出了依赖于说话人的孤立数字视觉语音识别任务的结果。我们首先说明,使用HCRFs可以通过增加每个状态考虑的过去和未来观察的数量来提高干净视频的单词识别率。其次,我们比较了在测试集上使用不同级别视频压缩的模型性能。据我们所知,这是第一次尝试使用HCRFs进行视觉语音识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hidden Conditional Random Fields for Visual Speech Recognition
In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信