S. Tantisatirapong, Chalisa Prasoproek, M. Phothisonothai
{"title":"基于口音的泰语语音识别系统特征提取比较","authors":"S. Tantisatirapong, Chalisa Prasoproek, M. Phothisonothai","doi":"10.1109/CCE.2018.8465705","DOIUrl":null,"url":null,"abstract":"This paper aims to compare the feature extraction methods for accent dependent Thai speech from three regions including central, southern and northeastern regions. We investigate four frequency analysis methods: i.e., Energy Spectral Density (ESD), Power Spectral Density (PSD), Mel-Frequency Cepstral Coefficients (MFCC) and Spectrogram (SPT). Radial basis function kernel based on support vector machine is used as a classifier with 5-fold cross validation. The isolated speech data sets are recorded from 30 male and 30 female participants speaking the 10 Thai digits from 0 to 9. The MFCC-based feature gives better accuracy than ESD, PSD and SPT respectively. For within the same region, the MFCC-based feature provides average accuracy of 94.9% and 99.1% for male and female voices respectively. For the three regions, the MFCC-based feature provides average accuracy of 89.34% and 93.81% for male and female voices, respectively.","PeriodicalId":118716,"journal":{"name":"2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)","volume":"268 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparison of Feature Extraction for Accent Dependent Thai Speech Recognition System\",\"authors\":\"S. Tantisatirapong, Chalisa Prasoproek, M. Phothisonothai\",\"doi\":\"10.1109/CCE.2018.8465705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper aims to compare the feature extraction methods for accent dependent Thai speech from three regions including central, southern and northeastern regions. We investigate four frequency analysis methods: i.e., Energy Spectral Density (ESD), Power Spectral Density (PSD), Mel-Frequency Cepstral Coefficients (MFCC) and Spectrogram (SPT). Radial basis function kernel based on support vector machine is used as a classifier with 5-fold cross validation. The isolated speech data sets are recorded from 30 male and 30 female participants speaking the 10 Thai digits from 0 to 9. The MFCC-based feature gives better accuracy than ESD, PSD and SPT respectively. For within the same region, the MFCC-based feature provides average accuracy of 94.9% and 99.1% for male and female voices respectively. For the three regions, the MFCC-based feature provides average accuracy of 89.34% and 93.81% for male and female voices, respectively.\",\"PeriodicalId\":118716,\"journal\":{\"name\":\"2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)\",\"volume\":\"268 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCE.2018.8465705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Seventh International Conference on Communications and Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCE.2018.8465705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of Feature Extraction for Accent Dependent Thai Speech Recognition System
This paper aims to compare the feature extraction methods for accent dependent Thai speech from three regions including central, southern and northeastern regions. We investigate four frequency analysis methods: i.e., Energy Spectral Density (ESD), Power Spectral Density (PSD), Mel-Frequency Cepstral Coefficients (MFCC) and Spectrogram (SPT). Radial basis function kernel based on support vector machine is used as a classifier with 5-fold cross validation. The isolated speech data sets are recorded from 30 male and 30 female participants speaking the 10 Thai digits from 0 to 9. The MFCC-based feature gives better accuracy than ESD, PSD and SPT respectively. For within the same region, the MFCC-based feature provides average accuracy of 94.9% and 99.1% for male and female voices respectively. For the three regions, the MFCC-based feature provides average accuracy of 89.34% and 93.81% for male and female voices, respectively.