An investigation of audio-visual speech recognition as applied to multimedia speech therapy applications

Proceedings IEEE International Conference on Multimedia Computing and Systems Pub Date : 1999-06-07 DOI:10.1109/MMCS.1999.779249

V. Georgopoulos

引用次数: 8

Abstract

A multimedia speech therapy system should be able to be used for customized speech therapy for different problems and for different ages. The speech recognition must be designed to work with high inter- and intra-speaker variability. In addition to displaying text on a screen, recording the voice reading the text, analyzing the recorded spoken signal and performing speech recognition which includes identification of speech irregularities and tracking of patient progress, it should be capable of analyzing visual signal of the patients' speech and provide visual as well as audio feedback. This implies that the synchronization of different media is important in realizing effective multimedia speech therapy applications. In order to perform speech recognition and identification tasks, time-frequency analysis and neural networks are proposed with integration of visual information.

查看原文本刊更多论文

视听语音识别在多媒体语音治疗中的应用研究

多媒体语言治疗系统应该能够针对不同的问题和不同的年龄进行个性化的语言治疗。语音识别必须设计成具有较高的说话人之间和说话人内部的可变性。除了在屏幕上显示文本，记录阅读文本的声音，分析记录的语音信号以及进行语音识别(包括识别语音异常和跟踪患者进展)之外，它还应该能够分析患者语音的视觉信号，并提供视觉和音频反馈。这意味着不同媒体的同步是实现有效的多媒体语言治疗应用的重要因素。为了完成语音识别和识别任务，提出了融合视觉信息的时频分析和神经网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings IEEE International Conference on Multimedia Computing and Systems

自引率

0.00%

发文量