Comparison of Feature Extraction for Accent Dependent Thai Speech Recognition System

S. Tantisatirapong, Chalisa Prasoproek, M. Phothisonothai
{"title":"Comparison of Feature Extraction for Accent Dependent Thai Speech Recognition System","authors":"S. Tantisatirapong, Chalisa Prasoproek, M. Phothisonothai","doi":"10.1109/CCE.2018.8465705","DOIUrl":null,"url":null,"abstract":"This paper aims to compare the feature extraction methods for accent dependent Thai speech from three regions including central, southern and northeastern regions. We investigate four frequency analysis methods: i.e., Energy Spectral Density (ESD), Power Spectral Density (PSD), Mel-Frequency Cepstral Coefficients (MFCC) and Spectrogram (SPT). Radial basis function kernel based on support vector machine is used as a classifier with 5-fold cross validation. The isolated speech data sets are recorded from 30 male and 30 female participants speaking the 10 Thai digits from 0 to 9. The MFCC-based feature gives better accuracy than ESD, PSD and SPT respectively. For within the same region, the MFCC-based feature provides average accuracy of 94.9% and 99.1% for male and female voices respectively. For the three regions, the MFCC-based feature provides average accuracy of 89.34% and 93.81% for male and female voices, respectively.","PeriodicalId":118716,"journal":{"name":"2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)","volume":"268 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCE.2018.8465705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

This paper aims to compare the feature extraction methods for accent dependent Thai speech from three regions including central, southern and northeastern regions. We investigate four frequency analysis methods: i.e., Energy Spectral Density (ESD), Power Spectral Density (PSD), Mel-Frequency Cepstral Coefficients (MFCC) and Spectrogram (SPT). Radial basis function kernel based on support vector machine is used as a classifier with 5-fold cross validation. The isolated speech data sets are recorded from 30 male and 30 female participants speaking the 10 Thai digits from 0 to 9. The MFCC-based feature gives better accuracy than ESD, PSD and SPT respectively. For within the same region, the MFCC-based feature provides average accuracy of 94.9% and 99.1% for male and female voices respectively. For the three regions, the MFCC-based feature provides average accuracy of 89.34% and 93.81% for male and female voices, respectively.
基于口音的泰语语音识别系统特征提取比较
本文旨在比较中部、南部和东北部三个地区泰语重音依赖语音的特征提取方法。我们研究了四种频率分析方法:能量谱密度(ESD)、功率谱密度(PSD)、mel -频率倒谱系数(MFCC)和谱图(SPT)。采用基于支持向量机的径向基函数核作为分类器进行5次交叉验证。独立的语音数据集记录了30名男性和30名女性参与者说10个泰国数字从0到9。基于mfc的特征分别比ESD、PSD和SPT具有更好的精度。在同一区域内,基于mfc的特征对男声和女声的平均准确率分别为94.9%和99.1%。对于这三个区域,基于mfc的特征对男声和女声的平均准确率分别为89.34%和93.81%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信