Improved acoustic modeling for automatic dysarthric speech recognition

2015 Twenty First National Conference on Communications (NCC) Pub Date : 2015-04-16 DOI:10.1109/NCC.2015.7084856

R. Sriranjani, M. Reddy, S. Umesh

引用次数: 10

Abstract

Dysarthria is a neuromuscular disorder, occurs due to improper coordination of speech musculature. In order to improve the quality of life of people with speech disorder, assistive technology using automatic speech recognition (ASR) systems are gaining importance. Since it is difficult for dysarthric speakers to provide sufficient data, data insufficiency is one of the major problems in building an efficient dysarthric ASR system. In this paper, we focus on handling this issue by pooling data from unimpaired speech database. Then feature space maximum likelihood linear regression (fMLLR) transformation is applied on pooled data and dysarthric data to normalize the effect of inter-speaker variability. The acoustic model built using the combined features (acoustically transformed dysarthric + pooled features) gives an relative improvement of 18.09% and 50.00% over baseline system for Nemours database and Universal Access speech (digit set) database.

查看原文本刊更多论文

改进声学建模，用于自动困难语音识别

构音障碍是一种神经肌肉障碍，是由于语言肌肉组织不协调而发生的。为了提高语言障碍患者的生活质量，使用自动语音识别(ASR)系统的辅助技术越来越重要。由于苦音说话者难以提供足够的数据，数据不足是构建高效苦音ASR系统的主要问题之一。在本文中，我们的重点是通过从未受损语音数据库中收集数据来解决这个问题。然后利用特征空间最大似然线性回归(fMLLR)变换对混合数据和异常数据进行归一化处理。使用组合特征(声学转换的dysarthic +池化特征)构建的声学模型在Nemours数据库和Universal Access语音(数字集)数据库的基线系统上相对提高了18.09%和50.00%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Twenty First National Conference on Communications (NCC)

自引率

0.00%

发文量