Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition

Satoshi Nakamura, K. Kumatani, S. Tamura
{"title":"Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition","authors":"Satoshi Nakamura, K. Kumatani, S. Tamura","doi":"10.1109/ICMI.2002.1167011","DOIUrl":null,"url":null,"abstract":"The demand for audio-visual speech recognition (AVSR) has increased in order to make speech recognition systems robust to acoustic noise. There are two kinds of research issue in audio-visual speech recognition, such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration, inevitably, necessitates modeling the synchronization and asynchronization of audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme, but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on the GPD algorithm for noisy, bimodal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. When SNR=0 dB our proposed method attained 16% higher performance compared to a product HMM without synchronicity re-estimation.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"293 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMI.2002.1167011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

The demand for audio-visual speech recognition (AVSR) has increased in order to make speech recognition systems robust to acoustic noise. There are two kinds of research issue in audio-visual speech recognition, such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration, inevitably, necessitates modeling the synchronization and asynchronization of audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme, but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on the GPD algorithm for noisy, bimodal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. When SNR=0 dB our proposed method attained 16% higher performance compared to a product HMM without synchronicity re-estimation.
基于产品hmm的多模态时间异步建模用于鲁棒视听语音识别
为了使语音识别系统对噪声具有鲁棒性,对视听语音识别(AVSR)的需求日益增加。在视听语音识别中存在两类研究问题,一类是考虑模式间异步性的集成建模,另一类是基于信息可靠性的自适应信息加权。本文提出了一种有效整合视听信息的方法。这种集成不可避免地需要对音频和视觉信息的同步和异步建模。为了解决语音和嘴唇运动之间个体特征的时滞和相关性问题,我们引入了一种基于产品HMM族的视听信息集成HMM建模。该模型不仅可以表示音素内的状态共时性,还可以表示音素间的状态共时性。此外,我们还提出了一种基于GPD算法的快速流权优化,用于有噪声的双峰语音识别。评价实验表明,该方法提高了对噪声语音的识别精度。当信噪比=0 dB时,我们提出的方法与没有同步性重新估计的产品HMM相比,性能提高了16%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信