Performance of Speech Recognition Algorithms in Musical Speech used for Speech-Language Pathology Rehabilitation

2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Pub Date : 2023-06-14 DOI:10.1109/MeMeA57477.2023.10171898

Pedram Aliniaye Asli, A. Zumbansen

{"title":"Performance of Speech Recognition Algorithms in Musical Speech used for Speech-Language Pathology Rehabilitation","authors":"Pedram Aliniaye Asli, A. Zumbansen","doi":"10.1109/MeMeA57477.2023.10171898","DOIUrl":null,"url":null,"abstract":"Musical speech in speech-language pathology rehabilitation is the production of speech following simple musical (rhythmic or melodic) patterns. This type of speech is used to facilitate speech processing in patients. In this study, we examined the performance of current automatic speech recognition (ASR) algorithms in recognizing normal and musical speech. From a first list of 28 identified algorithms, 24 were excluded for reasons such as low accuracy rate, high computational cost, high price, difficulty of use, long runtime, implementation problems. The four algorithms included were those from Amazon Web Services (AWS Transcribe), Google Speech Recognition, IBM Watson and Rev AI. We ran the selected algorithms on 60 sentences recorded under four speech conditions (Melodic; Rhythmic; Regular Slow; and Regular Normal). All algorithms did perfectly in recognizing the normal speech. The two algorithms with the best performance in musical speech (rhythmic and melodic speech) were AWS Transcribe and IBM Watson, both providing recognition accuracy above 98%. When adding moderate level of white noise and reverberation to the stimuli, AWS Transcribe remained with an acceptable (> 70%) or satisfactory (> 95%) ASR performance. These results may guide the development of software that use ASR to enable patients to undergo self-directed sessions of music-based speech-language rehabilitation, such as the melodic intonation therapy for post-stroke aphasia. The possibility to recognize musical speech allows to compare a patient’s performance to corresponding target phrases and provide feedback in the absence of a clinician. Given the recommended high intensity of treatment and the limited availability of speech-language pathologists, such software would be highly valuable to our healthcare systems.","PeriodicalId":191927,"journal":{"name":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA57477.2023.10171898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Musical speech in speech-language pathology rehabilitation is the production of speech following simple musical (rhythmic or melodic) patterns. This type of speech is used to facilitate speech processing in patients. In this study, we examined the performance of current automatic speech recognition (ASR) algorithms in recognizing normal and musical speech. From a first list of 28 identified algorithms, 24 were excluded for reasons such as low accuracy rate, high computational cost, high price, difficulty of use, long runtime, implementation problems. The four algorithms included were those from Amazon Web Services (AWS Transcribe), Google Speech Recognition, IBM Watson and Rev AI. We ran the selected algorithms on 60 sentences recorded under four speech conditions (Melodic; Rhythmic; Regular Slow; and Regular Normal). All algorithms did perfectly in recognizing the normal speech. The two algorithms with the best performance in musical speech (rhythmic and melodic speech) were AWS Transcribe and IBM Watson, both providing recognition accuracy above 98%. When adding moderate level of white noise and reverberation to the stimuli, AWS Transcribe remained with an acceptable (> 70%) or satisfactory (> 95%) ASR performance. These results may guide the development of software that use ASR to enable patients to undergo self-directed sessions of music-based speech-language rehabilitation, such as the melodic intonation therapy for post-stroke aphasia. The possibility to recognize musical speech allows to compare a patient’s performance to corresponding target phrases and provide feedback in the absence of a clinician. Given the recommended high intensity of treatment and the limited availability of speech-language pathologists, such software would be highly valuable to our healthcare systems.

查看原文本刊更多论文

语音识别算法在语音-语言病理康复中的应用

语言病理学康复中的音乐言语是指遵循简单音乐(节奏或旋律)模式的言语产生。这种类型的语音用于促进患者的语音处理。在这项研究中，我们研究了当前自动语音识别(ASR)算法在识别正常和音乐语音方面的性能。在第一批确定的28种算法中，有24种算法由于准确率低、计算成本高、价格高、使用困难、运行时间长、实现问题等原因被排除在外。这四种算法分别来自亚马逊网络服务(AWS转录)、谷歌语音识别、IBM沃森和Rev AI。我们在四种语音条件下(Melodic;有节奏的;常规缓慢;和Regular Normal)。所有算法在识别正常语音方面都做得很好。在音乐语音(节奏和旋律语音)中表现最好的两种算法是AWS转录和IBM沃森，两者的识别准确率都在98%以上。当在刺激中加入中等水平的白噪声和混响时，AWS转录保持可接受(> 70%)或满意(> 95%)的ASR性能。这些结果可能会指导软件的开发，这些软件使用ASR使患者能够进行基于音乐的自我指导的语言康复治疗，例如针对中风后失语症的旋律语调治疗。识别音乐语音的可能性允许将患者的表现与相应的目标短语进行比较，并在没有临床医生的情况下提供反馈。考虑到推荐的高强度治疗和有限的语言病理学家的可用性，这样的软件将对我们的医疗保健系统非常有价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)

自引率

0.00%

发文量