A Comparison of Time-Frequency Distributions for Deep Learning-Based Speech Assessment of Aphasic Patients

2022 15th International Conference on Human System Interaction (HSI) Pub Date : 2022-07-28 DOI:10.1109/HSI55341.2022.9869452

Akshay Kumar, S. Mahmoud, Yin Wang, S. Faisal, Qiang Fang

{"title":"A Comparison of Time-Frequency Distributions for Deep Learning-Based Speech Assessment of Aphasic Patients","authors":"Akshay Kumar, S. Mahmoud, Yin Wang, S. Faisal, Qiang Fang","doi":"10.1109/HSI55341.2022.9869452","DOIUrl":null,"url":null,"abstract":"Speech impairment assessment is an essential part of the rehabilitation of aphasic patients. As the number of stroke incidents is increasing year after year, it is essential to develop automatic speech impairment assessment (ASIA) methods. Deep learning, together with time-frequency distribution (TFD) representation of speech data, can be a promising solution for developing ASIA methods. However, before making further progress, it is essential to assess various TFDs in terms of their effectiveness for ASIA. Therefore, this paper assessed and compared various TFD methods for ASIA of Mandarin speech. Various state-of-the-art computer vision convolutional neural network models were trained, using TFDs of speech data of thirty-four healthy participants and twelve aphasic patients, to assess the effectiveness of TFDs. The automatic speech recognition rate was used as a measure for evaluating the performance of TFDs. Results showed that Mel spectrogram-based TFDs perform significantly better than the previously used Hyperbolic-T distribution TFDs, for automatic speech recognition. The results indicate that Mel spectrogram TFDs, instead of Hyperbolic-T distribution TFDs, can improve the ASIA performance. The findings presented will help improve the performance of deep learning- and TFD-based ASIA methods.","PeriodicalId":282607,"journal":{"name":"2022 15th International Conference on Human System Interaction (HSI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 15th International Conference on Human System Interaction (HSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HSI55341.2022.9869452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Speech impairment assessment is an essential part of the rehabilitation of aphasic patients. As the number of stroke incidents is increasing year after year, it is essential to develop automatic speech impairment assessment (ASIA) methods. Deep learning, together with time-frequency distribution (TFD) representation of speech data, can be a promising solution for developing ASIA methods. However, before making further progress, it is essential to assess various TFDs in terms of their effectiveness for ASIA. Therefore, this paper assessed and compared various TFD methods for ASIA of Mandarin speech. Various state-of-the-art computer vision convolutional neural network models were trained, using TFDs of speech data of thirty-four healthy participants and twelve aphasic patients, to assess the effectiveness of TFDs. The automatic speech recognition rate was used as a measure for evaluating the performance of TFDs. Results showed that Mel spectrogram-based TFDs perform significantly better than the previously used Hyperbolic-T distribution TFDs, for automatic speech recognition. The results indicate that Mel spectrogram TFDs, instead of Hyperbolic-T distribution TFDs, can improve the ASIA performance. The findings presented will help improve the performance of deep learning- and TFD-based ASIA methods.

查看原文本刊更多论文

基于深度学习的失语症患者语音评估时频分布比较

言语障碍评估是失语患者康复的重要组成部分。随着脑卒中病例数量的逐年增加，开发语言障碍自动评估(ASIA)方法势在必行。深度学习与语音数据的时频分布(TFD)表示一起，可以成为开发ASIA方法的一个有前途的解决方案。然而，在取得进一步进展之前，必须评估各种tfd对亚洲的有效性。因此，本文对汉语语音ASIA的各种TFD方法进行了评估和比较。使用34名健康参与者和12名失语患者的语音数据的tfd训练各种最先进的计算机视觉卷积神经网络模型，以评估tfd的有效性。以自动语音识别率作为评价TFDs性能的指标。结果表明，基于Mel谱图的tfd在自动语音识别方面的表现明显优于先前使用的双曲- t分布tfd。结果表明，Mel谱图tfd比双曲t分布tfd更能提高ASIA性能。所提出的研究结果将有助于提高基于深度学习和tfd的ASIA方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 15th International Conference on Human System Interaction (HSI)

自引率

0.00%

发文量