{"title":"使用鲁棒帧选择算法提高构音障碍语音的计算复杂度和单词识别率","authors":"Garima Vyas, M. Dutta, J. Prinosil","doi":"10.1504/IJSISE.2017.10006783","DOIUrl":null,"url":null,"abstract":"Dysarthria is a speech syndrome caused by the neurological damage in motor speech glands. In this paper, a robust frame selection algorithm has been employed to recognise the dysarthria speech with less time consumption. This algorithm determines the more informative frames which in turn reduce the size of feature matrix used for recognising the speech. This method results in a significant reduction in computational complexity without compromising with the word recognition rate (WRR) which may support a real time application. The amalgamation of four prosodic features: Mel frequency cepstral coefficients (MFCCs), Log of energy per frame, differential MFCCs and double differential MFCCs has been used for training and testing the Hidden Markov Models (HMMs) for speech recognition. Several try-outs were performed on the high, medium and low intelligibility audio clips with a vocabulary size of 29 isolated words. The time complexity of the whole system is reduced up to 54.8% with respect to the time taken by the system without implementing RFS. The proposed scheme is gender, speaker and age independent.","PeriodicalId":56359,"journal":{"name":"International Journal of Signal and Imaging Systems Engineering","volume":"10 1","pages":"136"},"PeriodicalIF":0.6000,"publicationDate":"2017-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving the computational complexity and word recognition rate for dysarthria speech using robust frame selection algorithm\",\"authors\":\"Garima Vyas, M. Dutta, J. Prinosil\",\"doi\":\"10.1504/IJSISE.2017.10006783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dysarthria is a speech syndrome caused by the neurological damage in motor speech glands. In this paper, a robust frame selection algorithm has been employed to recognise the dysarthria speech with less time consumption. This algorithm determines the more informative frames which in turn reduce the size of feature matrix used for recognising the speech. This method results in a significant reduction in computational complexity without compromising with the word recognition rate (WRR) which may support a real time application. The amalgamation of four prosodic features: Mel frequency cepstral coefficients (MFCCs), Log of energy per frame, differential MFCCs and double differential MFCCs has been used for training and testing the Hidden Markov Models (HMMs) for speech recognition. Several try-outs were performed on the high, medium and low intelligibility audio clips with a vocabulary size of 29 isolated words. The time complexity of the whole system is reduced up to 54.8% with respect to the time taken by the system without implementing RFS. The proposed scheme is gender, speaker and age independent.\",\"PeriodicalId\":56359,\"journal\":{\"name\":\"International Journal of Signal and Imaging Systems Engineering\",\"volume\":\"10 1\",\"pages\":\"136\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2017-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Signal and Imaging Systems Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJSISE.2017.10006783\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Signal and Imaging Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJSISE.2017.10006783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
摘要
构音障碍是一种由运动言语腺的神经损伤引起的言语综合征。本文采用了一种鲁棒的帧选择算法来识别构音障碍语音,耗时少。该算法确定信息量更大的帧,从而减小了用于识别语音的特征矩阵的大小。该方法显著降低了计算复杂度,而不影响可以支持实时应用的单词识别率(WRR)。融合了Mel频率倒谱系数(MFCC)、每帧能量对数(Log of energy per frame)、差分MFCC和双差分MFCCs四个韵律特征,用于训练和测试语音识别的隐马尔可夫模型(HMM)。对词汇大小为29个孤立单词的高、中、低可懂度音频片段进行了几次测试。相对于不实施RFS的系统所花费的时间,整个系统的时间复杂性降低了54.8%。拟议方案与性别、发言人和年龄无关。
Improving the computational complexity and word recognition rate for dysarthria speech using robust frame selection algorithm
Dysarthria is a speech syndrome caused by the neurological damage in motor speech glands. In this paper, a robust frame selection algorithm has been employed to recognise the dysarthria speech with less time consumption. This algorithm determines the more informative frames which in turn reduce the size of feature matrix used for recognising the speech. This method results in a significant reduction in computational complexity without compromising with the word recognition rate (WRR) which may support a real time application. The amalgamation of four prosodic features: Mel frequency cepstral coefficients (MFCCs), Log of energy per frame, differential MFCCs and double differential MFCCs has been used for training and testing the Hidden Markov Models (HMMs) for speech recognition. Several try-outs were performed on the high, medium and low intelligibility audio clips with a vocabulary size of 29 isolated words. The time complexity of the whole system is reduced up to 54.8% with respect to the time taken by the system without implementing RFS. The proposed scheme is gender, speaker and age independent.