An investigation of audio-visual speech recognition as applied to multimedia speech therapy applications

V. Georgopoulos
{"title":"An investigation of audio-visual speech recognition as applied to multimedia speech therapy applications","authors":"V. Georgopoulos","doi":"10.1109/MMCS.1999.779249","DOIUrl":null,"url":null,"abstract":"A multimedia speech therapy system should be able to be used for customized speech therapy for different problems and for different ages. The speech recognition must be designed to work with high inter- and intra-speaker variability. In addition to displaying text on a screen, recording the voice reading the text, analyzing the recorded spoken signal and performing speech recognition which includes identification of speech irregularities and tracking of patient progress, it should be capable of analyzing visual signal of the patients' speech and provide visual as well as audio feedback. This implies that the synchronization of different media is important in realizing effective multimedia speech therapy applications. In order to perform speech recognition and identification tasks, time-frequency analysis and neural networks are proposed with integration of visual information.","PeriodicalId":408680,"journal":{"name":"Proceedings IEEE International Conference on Multimedia Computing and Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Conference on Multimedia Computing and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMCS.1999.779249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

A multimedia speech therapy system should be able to be used for customized speech therapy for different problems and for different ages. The speech recognition must be designed to work with high inter- and intra-speaker variability. In addition to displaying text on a screen, recording the voice reading the text, analyzing the recorded spoken signal and performing speech recognition which includes identification of speech irregularities and tracking of patient progress, it should be capable of analyzing visual signal of the patients' speech and provide visual as well as audio feedback. This implies that the synchronization of different media is important in realizing effective multimedia speech therapy applications. In order to perform speech recognition and identification tasks, time-frequency analysis and neural networks are proposed with integration of visual information.
视听语音识别在多媒体语音治疗中的应用研究
多媒体语言治疗系统应该能够针对不同的问题和不同的年龄进行个性化的语言治疗。语音识别必须设计成具有较高的说话人之间和说话人内部的可变性。除了在屏幕上显示文本,记录阅读文本的声音,分析记录的语音信号以及进行语音识别(包括识别语音异常和跟踪患者进展)之外,它还应该能够分析患者语音的视觉信号,并提供视觉和音频反馈。这意味着不同媒体的同步是实现有效的多媒体语言治疗应用的重要因素。为了完成语音识别和识别任务,提出了融合视觉信息的时频分析和神经网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信