A Comparison of Time-Frequency Distributions for Deep Learning-Based Speech Assessment of Aphasic Patients

Akshay Kumar, S. Mahmoud, Yin Wang, S. Faisal, Qiang Fang
{"title":"A Comparison of Time-Frequency Distributions for Deep Learning-Based Speech Assessment of Aphasic Patients","authors":"Akshay Kumar, S. Mahmoud, Yin Wang, S. Faisal, Qiang Fang","doi":"10.1109/HSI55341.2022.9869452","DOIUrl":null,"url":null,"abstract":"Speech impairment assessment is an essential part of the rehabilitation of aphasic patients. As the number of stroke incidents is increasing year after year, it is essential to develop automatic speech impairment assessment (ASIA) methods. Deep learning, together with time-frequency distribution (TFD) representation of speech data, can be a promising solution for developing ASIA methods. However, before making further progress, it is essential to assess various TFDs in terms of their effectiveness for ASIA. Therefore, this paper assessed and compared various TFD methods for ASIA of Mandarin speech. Various state-of-the-art computer vision convolutional neural network models were trained, using TFDs of speech data of thirty-four healthy participants and twelve aphasic patients, to assess the effectiveness of TFDs. The automatic speech recognition rate was used as a measure for evaluating the performance of TFDs. Results showed that Mel spectrogram-based TFDs perform significantly better than the previously used Hyperbolic-T distribution TFDs, for automatic speech recognition. The results indicate that Mel spectrogram TFDs, instead of Hyperbolic-T distribution TFDs, can improve the ASIA performance. The findings presented will help improve the performance of deep learning- and TFD-based ASIA methods.","PeriodicalId":282607,"journal":{"name":"2022 15th International Conference on Human System Interaction (HSI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 15th International Conference on Human System Interaction (HSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HSI55341.2022.9869452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Speech impairment assessment is an essential part of the rehabilitation of aphasic patients. As the number of stroke incidents is increasing year after year, it is essential to develop automatic speech impairment assessment (ASIA) methods. Deep learning, together with time-frequency distribution (TFD) representation of speech data, can be a promising solution for developing ASIA methods. However, before making further progress, it is essential to assess various TFDs in terms of their effectiveness for ASIA. Therefore, this paper assessed and compared various TFD methods for ASIA of Mandarin speech. Various state-of-the-art computer vision convolutional neural network models were trained, using TFDs of speech data of thirty-four healthy participants and twelve aphasic patients, to assess the effectiveness of TFDs. The automatic speech recognition rate was used as a measure for evaluating the performance of TFDs. Results showed that Mel spectrogram-based TFDs perform significantly better than the previously used Hyperbolic-T distribution TFDs, for automatic speech recognition. The results indicate that Mel spectrogram TFDs, instead of Hyperbolic-T distribution TFDs, can improve the ASIA performance. The findings presented will help improve the performance of deep learning- and TFD-based ASIA methods.
基于深度学习的失语症患者语音评估时频分布比较
言语障碍评估是失语患者康复的重要组成部分。随着脑卒中病例数量的逐年增加,开发语言障碍自动评估(ASIA)方法势在必行。深度学习与语音数据的时频分布(TFD)表示一起,可以成为开发ASIA方法的一个有前途的解决方案。然而,在取得进一步进展之前,必须评估各种tfd对亚洲的有效性。因此,本文对汉语语音ASIA的各种TFD方法进行了评估和比较。使用34名健康参与者和12名失语患者的语音数据的tfd训练各种最先进的计算机视觉卷积神经网络模型,以评估tfd的有效性。以自动语音识别率作为评价TFDs性能的指标。结果表明,基于Mel谱图的tfd在自动语音识别方面的表现明显优于先前使用的双曲- t分布tfd。结果表明,Mel谱图tfd比双曲t分布tfd更能提高ASIA性能。所提出的研究结果将有助于提高基于深度学习和tfd的ASIA方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信