Comparison of Feature Extraction for Accent Dependent Thai Speech Recognition System

2018 IEEE Seventh International Conference on Communications and Electronics (ICCE) Pub Date : 2018-07-01 DOI:10.1109/CCE.2018.8465705

S. Tantisatirapong, Chalisa Prasoproek, M. Phothisonothai

引用次数: 5

Abstract

This paper aims to compare the feature extraction methods for accent dependent Thai speech from three regions including central, southern and northeastern regions. We investigate four frequency analysis methods: i.e., Energy Spectral Density (ESD), Power Spectral Density (PSD), Mel-Frequency Cepstral Coefficients (MFCC) and Spectrogram (SPT). Radial basis function kernel based on support vector machine is used as a classifier with 5-fold cross validation. The isolated speech data sets are recorded from 30 male and 30 female participants speaking the 10 Thai digits from 0 to 9. The MFCC-based feature gives better accuracy than ESD, PSD and SPT respectively. For within the same region, the MFCC-based feature provides average accuracy of 94.9% and 99.1% for male and female voices respectively. For the three regions, the MFCC-based feature provides average accuracy of 89.34% and 93.81% for male and female voices, respectively.

查看原文本刊更多论文

基于口音的泰语语音识别系统特征提取比较

本文旨在比较中部、南部和东北部三个地区泰语重音依赖语音的特征提取方法。我们研究了四种频率分析方法:能量谱密度(ESD)、功率谱密度(PSD)、mel -频率倒谱系数(MFCC)和谱图(SPT)。采用基于支持向量机的径向基函数核作为分类器进行5次交叉验证。独立的语音数据集记录了30名男性和30名女性参与者说10个泰国数字从0到9。基于mfc的特征分别比ESD、PSD和SPT具有更好的精度。在同一区域内，基于mfc的特征对男声和女声的平均准确率分别为94.9%和99.1%。对于这三个区域，基于mfc的特征对男声和女声的平均准确率分别为89.34%和93.81%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)

自引率

0.00%

发文量