Frequency Domain Analysis of MFCC Feature Extraction in Children’s Speech Recognition System

Risanuri Hidayat
{"title":"Frequency Domain Analysis of MFCC Feature Extraction in Children’s Speech Recognition System","authors":"Risanuri Hidayat","doi":"10.20895/infotel.v14i1.740","DOIUrl":null,"url":null,"abstract":"Abstract —The research on speech recognition systems currently focuses on the analysis of robust speech recognition systems. When the speech signals are combined with noise, the recognition system becomes distracted, struggling to identify the speech sounds. Therefore, the development of a robust speech recognition system continues to be carried out. The principle of a robust speech recognition system is to eliminate noise from the speech signals and restore the original information signals. In this paper, researchers conducted a frequency domain analysis on one stage of the Mel Frequency Cepstral Coefficients (MFCC) process, the Fast Fourier Transform (FFT), in children's speech recognition system. The FTT analysis in the feature extraction process determined the effect of frequency value characteristics utilized in the FFT output on the noise disruption. The analysis method was designed into three scenarios based on the value of the employed FFT points. The differences between scenarios were based on the number of shared FFT points. All FFT points were divided into four, three, and two parts in the first, second, and third scenarios, respectively. This study utilized children's speech data from the isolated TIDIGIT English digit corpus. As comparative data, the noise was added manually to simulate real-world conditions. The results showed that using a particular frequency portion following the scenario designed on MFCC affected the recognition system performance, which was relatively significant on the noisy speech data. The designed method in the scenario 3 (C1) version generated the highest accuracy, exceeded the accuracy of the conventional MFCC method. The average accuracy in the scenario 3 (C1) method increased by 1% more than all the tested noise types. Using various noise intensity values (SNR), the testing process indicates that scenario 3 (C1) generates a higher accuracy than conventional MFCC in all tested SNR values. It proves that the selection of specific frequency utilized in MFCC feature extraction significantly affects the recognition accuracy in a noisy speech.","PeriodicalId":30672,"journal":{"name":"Jurnal Infotel","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Infotel","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20895/infotel.v14i1.740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract —The research on speech recognition systems currently focuses on the analysis of robust speech recognition systems. When the speech signals are combined with noise, the recognition system becomes distracted, struggling to identify the speech sounds. Therefore, the development of a robust speech recognition system continues to be carried out. The principle of a robust speech recognition system is to eliminate noise from the speech signals and restore the original information signals. In this paper, researchers conducted a frequency domain analysis on one stage of the Mel Frequency Cepstral Coefficients (MFCC) process, the Fast Fourier Transform (FFT), in children's speech recognition system. The FTT analysis in the feature extraction process determined the effect of frequency value characteristics utilized in the FFT output on the noise disruption. The analysis method was designed into three scenarios based on the value of the employed FFT points. The differences between scenarios were based on the number of shared FFT points. All FFT points were divided into four, three, and two parts in the first, second, and third scenarios, respectively. This study utilized children's speech data from the isolated TIDIGIT English digit corpus. As comparative data, the noise was added manually to simulate real-world conditions. The results showed that using a particular frequency portion following the scenario designed on MFCC affected the recognition system performance, which was relatively significant on the noisy speech data. The designed method in the scenario 3 (C1) version generated the highest accuracy, exceeded the accuracy of the conventional MFCC method. The average accuracy in the scenario 3 (C1) method increased by 1% more than all the tested noise types. Using various noise intensity values (SNR), the testing process indicates that scenario 3 (C1) generates a higher accuracy than conventional MFCC in all tested SNR values. It proves that the selection of specific frequency utilized in MFCC feature extraction significantly affects the recognition accuracy in a noisy speech.
儿童语音识别系统中MFCC特征提取的频域分析
摘要-目前对语音识别系统的研究主要集中在鲁棒性语音识别系统的分析上。当语音信号与噪音结合在一起时,识别系统就会分心,难以识别语音。因此,开发一个鲁棒的语音识别系统继续进行。鲁棒语音识别系统的原理是消除语音信号中的噪声,恢复原始信息信号。本文对儿童语音识别系统中Mel倒频系数(MFCC)处理的其中一个阶段——快速傅立叶变换(FFT)进行了频域分析。特征提取过程中的FTT分析确定了FFT输出中所利用的频率值特性对噪声干扰的影响。根据所使用的FFT点的值,将分析方法设计为三种场景。不同场景之间的差异基于共享FFT点的数量。在第一、第二和第三种情况下,所有FFT点分别分为四、三和两部分。本研究使用了来自TIDIGIT英语数字语料库的儿童语音数据。作为比较数据,噪声是人工添加的,以模拟现实世界的条件。结果表明,在MFCC设计的场景下使用特定的频率部分对识别系统的性能有影响,并且在有噪声的语音数据上影响比较显著。在场景3 (C1)版本中设计的方法产生的精度最高,超过了常规MFCC方法的精度。场景3 (C1)方法的平均准确度比所有测试的噪声类型提高了1%以上。使用不同的噪声强度值(SNR),测试过程表明,场景3 (C1)在所有测试的信噪比值上都比传统MFCC产生更高的精度。结果表明,MFCC特征提取中特定频率的选择对噪声语音的识别精度影响很大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
47
审稿时长
6 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信