Collaborative Training of Acoustic Encoder for Recognizing the Impaired Children Speech

2022 Fifth College of Science International Conference of Recent Trends in Information Technology (CSCTIT) Pub Date : 2022-11-15 DOI:10.1109/CSCTIT56299.2022.10145742

S. Shareef, Yusra Faisal Mohammed

{"title":"Collaborative Training of Acoustic Encoder for Recognizing the Impaired Children Speech","authors":"S. Shareef, Yusra Faisal Mohammed","doi":"10.1109/CSCTIT56299.2022.10145742","DOIUrl":null,"url":null,"abstract":"Encoder-decoder models have become an effective approach and are increasingly popular for sequence learning tasks like automatic speech recognition (ASR) due to their simplified processing stages. The traditional decoder-based approach usually learns a sequence-to-sequence mapping function from the source speech to target units (character-level, word-level, or phoneme-level). However, without sufficient and undefective training data it still has degraded performance. In a situation like impaired children's speech recognition, the low and disordered information within the audio signal due to impaired pronunciation requires processing with different levels to improve the model's ability. Therefore, this work adopts a particular encoder's structure that can improve the encoder's ability for encodes the input sequence into a sequence of hidden representations. This paper proposes a collaborative training of Sequence predictive models as an acoustic encoder for ASR systems for Arabic impaired children's speech. Each sequence model maps the input sequence to only one phoneme from the output sequence. Then, the encoder alignment the output phonemes within one output sequence. Experimental results on the Impaired Children's Speech of Arabic dataset show the collaboratively trained acoustic encoder to align phonemes sequence can provide up to a 10% relative improvement in accuracy compared to traditional methods. Especially, since this paper is the first in the area of recognizing impaired children's speech at the level of Arabic texts.","PeriodicalId":243635,"journal":{"name":"2022 Fifth College of Science International Conference of Recent Trends in Information Technology (CSCTIT)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Fifth College of Science International Conference of Recent Trends in Information Technology (CSCTIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCTIT56299.2022.10145742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Encoder-decoder models have become an effective approach and are increasingly popular for sequence learning tasks like automatic speech recognition (ASR) due to their simplified processing stages. The traditional decoder-based approach usually learns a sequence-to-sequence mapping function from the source speech to target units (character-level, word-level, or phoneme-level). However, without sufficient and undefective training data it still has degraded performance. In a situation like impaired children's speech recognition, the low and disordered information within the audio signal due to impaired pronunciation requires processing with different levels to improve the model's ability. Therefore, this work adopts a particular encoder's structure that can improve the encoder's ability for encodes the input sequence into a sequence of hidden representations. This paper proposes a collaborative training of Sequence predictive models as an acoustic encoder for ASR systems for Arabic impaired children's speech. Each sequence model maps the input sequence to only one phoneme from the output sequence. Then, the encoder alignment the output phonemes within one output sequence. Experimental results on the Impaired Children's Speech of Arabic dataset show the collaboratively trained acoustic encoder to align phonemes sequence can provide up to a 10% relative improvement in accuracy compared to traditional methods. Especially, since this paper is the first in the area of recognizing impaired children's speech at the level of Arabic texts.

查看原文本刊更多论文

语音编码器在残障儿童语音识别中的协同训练

编码器-解码器模型已经成为一种有效的方法，并且由于其简化的处理阶段，在自动语音识别(ASR)等序列学习任务中越来越受欢迎。传统的基于解码器的方法通常学习从源语音到目标单元(字符级、词级或音素级)的序列到序列映射功能。然而，如果没有足够和完整的训练数据，它仍然会降低性能。在儿童语音识别受损的情况下，由于发音受损导致的音频信号中信息低而混乱，需要对其进行不同程度的处理，以提高模型的能力。因此，本工作采用了一种特殊的编码器结构，可以提高编码器将输入序列编码为隐藏表示序列的能力。本文提出了一种协同训练序列预测模型作为阿拉伯语障碍儿童ASR系统的声编码器。每个序列模型只将输入序列映射到输出序列中的一个音素。然后，编码器在一个输出序列中对齐输出音素。在阿拉伯文受损儿童语音数据集上的实验结果表明，与传统方法相比，协同训练的声学编码器对齐音素序列可以提供高达10%的相对精度提高。特别是，由于本文是第一个在阿拉伯语文本水平上识别受损儿童语言的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Fifth College of Science International Conference of Recent Trends in Information Technology (CSCTIT)

自引率

0.00%

发文量