{"title":"Performance of Speech Recognition Algorithms in Musical Speech used for Speech-Language Pathology Rehabilitation","authors":"Pedram Aliniaye Asli, A. Zumbansen","doi":"10.1109/MeMeA57477.2023.10171898","DOIUrl":null,"url":null,"abstract":"Musical speech in speech-language pathology rehabilitation is the production of speech following simple musical (rhythmic or melodic) patterns. This type of speech is used to facilitate speech processing in patients. In this study, we examined the performance of current automatic speech recognition (ASR) algorithms in recognizing normal and musical speech. From a first list of 28 identified algorithms, 24 were excluded for reasons such as low accuracy rate, high computational cost, high price, difficulty of use, long runtime, implementation problems. The four algorithms included were those from Amazon Web Services (AWS Transcribe), Google Speech Recognition, IBM Watson and Rev AI. We ran the selected algorithms on 60 sentences recorded under four speech conditions (Melodic; Rhythmic; Regular Slow; and Regular Normal). All algorithms did perfectly in recognizing the normal speech. The two algorithms with the best performance in musical speech (rhythmic and melodic speech) were AWS Transcribe and IBM Watson, both providing recognition accuracy above 98%. When adding moderate level of white noise and reverberation to the stimuli, AWS Transcribe remained with an acceptable (> 70%) or satisfactory (> 95%) ASR performance. These results may guide the development of software that use ASR to enable patients to undergo self-directed sessions of music-based speech-language rehabilitation, such as the melodic intonation therapy for post-stroke aphasia. The possibility to recognize musical speech allows to compare a patient’s performance to corresponding target phrases and provide feedback in the absence of a clinician. Given the recommended high intensity of treatment and the limited availability of speech-language pathologists, such software would be highly valuable to our healthcare systems.","PeriodicalId":191927,"journal":{"name":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA57477.2023.10171898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Musical speech in speech-language pathology rehabilitation is the production of speech following simple musical (rhythmic or melodic) patterns. This type of speech is used to facilitate speech processing in patients. In this study, we examined the performance of current automatic speech recognition (ASR) algorithms in recognizing normal and musical speech. From a first list of 28 identified algorithms, 24 were excluded for reasons such as low accuracy rate, high computational cost, high price, difficulty of use, long runtime, implementation problems. The four algorithms included were those from Amazon Web Services (AWS Transcribe), Google Speech Recognition, IBM Watson and Rev AI. We ran the selected algorithms on 60 sentences recorded under four speech conditions (Melodic; Rhythmic; Regular Slow; and Regular Normal). All algorithms did perfectly in recognizing the normal speech. The two algorithms with the best performance in musical speech (rhythmic and melodic speech) were AWS Transcribe and IBM Watson, both providing recognition accuracy above 98%. When adding moderate level of white noise and reverberation to the stimuli, AWS Transcribe remained with an acceptable (> 70%) or satisfactory (> 95%) ASR performance. These results may guide the development of software that use ASR to enable patients to undergo self-directed sessions of music-based speech-language rehabilitation, such as the melodic intonation therapy for post-stroke aphasia. The possibility to recognize musical speech allows to compare a patient’s performance to corresponding target phrases and provide feedback in the absence of a clinician. Given the recommended high intensity of treatment and the limited availability of speech-language pathologists, such software would be highly valuable to our healthcare systems.