Collaborative Training of Acoustic Encoder for Recognizing the Impaired Children Speech

S. Shareef, Yusra Faisal Mohammed
{"title":"Collaborative Training of Acoustic Encoder for Recognizing the Impaired Children Speech","authors":"S. Shareef, Yusra Faisal Mohammed","doi":"10.1109/CSCTIT56299.2022.10145742","DOIUrl":null,"url":null,"abstract":"Encoder-decoder models have become an effective approach and are increasingly popular for sequence learning tasks like automatic speech recognition (ASR) due to their simplified processing stages. The traditional decoder-based approach usually learns a sequence-to-sequence mapping function from the source speech to target units (character-level, word-level, or phoneme-level). However, without sufficient and undefective training data it still has degraded performance. In a situation like impaired children's speech recognition, the low and disordered information within the audio signal due to impaired pronunciation requires processing with different levels to improve the model's ability. Therefore, this work adopts a particular encoder's structure that can improve the encoder's ability for encodes the input sequence into a sequence of hidden representations. This paper proposes a collaborative training of Sequence predictive models as an acoustic encoder for ASR systems for Arabic impaired children's speech. Each sequence model maps the input sequence to only one phoneme from the output sequence. Then, the encoder alignment the output phonemes within one output sequence. Experimental results on the Impaired Children's Speech of Arabic dataset show the collaboratively trained acoustic encoder to align phonemes sequence can provide up to a 10% relative improvement in accuracy compared to traditional methods. Especially, since this paper is the first in the area of recognizing impaired children's speech at the level of Arabic texts.","PeriodicalId":243635,"journal":{"name":"2022 Fifth College of Science International Conference of Recent Trends in Information Technology (CSCTIT)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Fifth College of Science International Conference of Recent Trends in Information Technology (CSCTIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCTIT56299.2022.10145742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Encoder-decoder models have become an effective approach and are increasingly popular for sequence learning tasks like automatic speech recognition (ASR) due to their simplified processing stages. The traditional decoder-based approach usually learns a sequence-to-sequence mapping function from the source speech to target units (character-level, word-level, or phoneme-level). However, without sufficient and undefective training data it still has degraded performance. In a situation like impaired children's speech recognition, the low and disordered information within the audio signal due to impaired pronunciation requires processing with different levels to improve the model's ability. Therefore, this work adopts a particular encoder's structure that can improve the encoder's ability for encodes the input sequence into a sequence of hidden representations. This paper proposes a collaborative training of Sequence predictive models as an acoustic encoder for ASR systems for Arabic impaired children's speech. Each sequence model maps the input sequence to only one phoneme from the output sequence. Then, the encoder alignment the output phonemes within one output sequence. Experimental results on the Impaired Children's Speech of Arabic dataset show the collaboratively trained acoustic encoder to align phonemes sequence can provide up to a 10% relative improvement in accuracy compared to traditional methods. Especially, since this paper is the first in the area of recognizing impaired children's speech at the level of Arabic texts.
语音编码器在残障儿童语音识别中的协同训练
编码器-解码器模型已经成为一种有效的方法,并且由于其简化的处理阶段,在自动语音识别(ASR)等序列学习任务中越来越受欢迎。传统的基于解码器的方法通常学习从源语音到目标单元(字符级、词级或音素级)的序列到序列映射功能。然而,如果没有足够和完整的训练数据,它仍然会降低性能。在儿童语音识别受损的情况下,由于发音受损导致的音频信号中信息低而混乱,需要对其进行不同程度的处理,以提高模型的能力。因此,本工作采用了一种特殊的编码器结构,可以提高编码器将输入序列编码为隐藏表示序列的能力。本文提出了一种协同训练序列预测模型作为阿拉伯语障碍儿童ASR系统的声编码器。每个序列模型只将输入序列映射到输出序列中的一个音素。然后,编码器在一个输出序列中对齐输出音素。在阿拉伯文受损儿童语音数据集上的实验结果表明,与传统方法相比,协同训练的声学编码器对齐音素序列可以提供高达10%的相对精度提高。特别是,由于本文是第一个在阿拉伯语文本水平上识别受损儿童语言的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信