Multimodal speech recognition of a person with articulation disorders using AAM and MAF

Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li
{"title":"Multimodal speech recognition of a person with articulation disorders using AAM and MAF","authors":"Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li","doi":"10.1109/MMSP.2010.5662075","DOIUrl":null,"url":null,"abstract":"We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2010.5662075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.
利用AAM和MAF对发音障碍患者进行多模态语音识别
我们调查了一个人的语音识别与衔接障碍导致的动脉样脑瘫。由于言语相关肌肉的紧张,言语的发音会变得不稳定,从而导致言语识别能力的下降。因此,我们使用多声框架(MAF)作为声学特征来解决这个问题。此外,在真实环境中,由于噪声的影响,当前的语音识别系统没有足够的性能。除了声学特征外,视觉特征还用于提高真实环境中的噪声鲁棒性。然而,脑瘫患者往往会不规律地移动头部,从而导致识别问题。我们研究了一种使用主动外观模型(AAM)的姿态鲁棒性视听语音识别方法,以解决由动脉样脑瘫引起的发音障碍患者的这一问题。aam用于人脸跟踪,提取具有姿态鲁棒性的人脸特征点。通过对发音障碍患者嘈杂言语的识别实验,验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信