Improved acoustic modeling for automatic dysarthric speech recognition

R. Sriranjani, M. Reddy, S. Umesh
{"title":"Improved acoustic modeling for automatic dysarthric speech recognition","authors":"R. Sriranjani, M. Reddy, S. Umesh","doi":"10.1109/NCC.2015.7084856","DOIUrl":null,"url":null,"abstract":"Dysarthria is a neuromuscular disorder, occurs due to improper coordination of speech musculature. In order to improve the quality of life of people with speech disorder, assistive technology using automatic speech recognition (ASR) systems are gaining importance. Since it is difficult for dysarthric speakers to provide sufficient data, data insufficiency is one of the major problems in building an efficient dysarthric ASR system. In this paper, we focus on handling this issue by pooling data from unimpaired speech database. Then feature space maximum likelihood linear regression (fMLLR) transformation is applied on pooled data and dysarthric data to normalize the effect of inter-speaker variability. The acoustic model built using the combined features (acoustically transformed dysarthric + pooled features) gives an relative improvement of 18.09% and 50.00% over baseline system for Nemours database and Universal Access speech (digit set) database.","PeriodicalId":302718,"journal":{"name":"2015 Twenty First National Conference on Communications (NCC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Twenty First National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2015.7084856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Dysarthria is a neuromuscular disorder, occurs due to improper coordination of speech musculature. In order to improve the quality of life of people with speech disorder, assistive technology using automatic speech recognition (ASR) systems are gaining importance. Since it is difficult for dysarthric speakers to provide sufficient data, data insufficiency is one of the major problems in building an efficient dysarthric ASR system. In this paper, we focus on handling this issue by pooling data from unimpaired speech database. Then feature space maximum likelihood linear regression (fMLLR) transformation is applied on pooled data and dysarthric data to normalize the effect of inter-speaker variability. The acoustic model built using the combined features (acoustically transformed dysarthric + pooled features) gives an relative improvement of 18.09% and 50.00% over baseline system for Nemours database and Universal Access speech (digit set) database.
改进声学建模,用于自动困难语音识别
构音障碍是一种神经肌肉障碍,是由于语言肌肉组织不协调而发生的。为了提高语言障碍患者的生活质量,使用自动语音识别(ASR)系统的辅助技术越来越重要。由于苦音说话者难以提供足够的数据,数据不足是构建高效苦音ASR系统的主要问题之一。在本文中,我们的重点是通过从未受损语音数据库中收集数据来解决这个问题。然后利用特征空间最大似然线性回归(fMLLR)变换对混合数据和异常数据进行归一化处理。使用组合特征(声学转换的dysarthic +池化特征)构建的声学模型在Nemours数据库和Universal Access语音(数字集)数据库的基线系统上相对提高了18.09%和50.00%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信