T. Villa-Cañas, J. Orozco-Arroyave, J. D. Arias-Londoño, J. Vargas-Bonilla, J. Godino-Llorente
{"title":"根据GRBAS尺度使用调制频谱,Mel频率倒谱系数和噪声参数自动评估语音信号","authors":"T. Villa-Cañas, J. Orozco-Arroyave, J. D. Arias-Londoño, J. Vargas-Bonilla, J. Godino-Llorente","doi":"10.1109/STSIVA.2013.6644930","DOIUrl":null,"url":null,"abstract":"This paper presents a system for the automatic assessment of voice quality, according to the GRBAS scale, which considers different speech measures. The set of features includes the centroids and the energy content of different frequency bands in the modulation spectra of the recordings, Mel-frequency Cepstral Coefficients, Harmonics to Noise Ratio, Normalizes Noise Energy and Glottal to Noise Excitation Ratio. Additionally, with the aim of eliminate possible redundance in the information provided by the features, two different feature extraction techniques are applied, Principal Component Analysis and Linear Discriminant Analysis. The multiclass classification is done by means of K Nearest Neighbors classifier. The performance of the system is measured in terms of efficiency and statistical agreement index Kappa. The results show that this approach provides acceptable results for this purpose, with the best efficiency around 89.3% for Asthenia (A).","PeriodicalId":359994,"journal":{"name":"Symposium of Signals, Images and Artificial Vision - 2013: STSIVA - 2013","volume":"601 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Automatic assessment of voice signals according to the GRBAS scale using modulation spectra, Mel frequency Cepstral Coefficients and Noise parameters\",\"authors\":\"T. Villa-Cañas, J. Orozco-Arroyave, J. D. Arias-Londoño, J. Vargas-Bonilla, J. Godino-Llorente\",\"doi\":\"10.1109/STSIVA.2013.6644930\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a system for the automatic assessment of voice quality, according to the GRBAS scale, which considers different speech measures. The set of features includes the centroids and the energy content of different frequency bands in the modulation spectra of the recordings, Mel-frequency Cepstral Coefficients, Harmonics to Noise Ratio, Normalizes Noise Energy and Glottal to Noise Excitation Ratio. Additionally, with the aim of eliminate possible redundance in the information provided by the features, two different feature extraction techniques are applied, Principal Component Analysis and Linear Discriminant Analysis. The multiclass classification is done by means of K Nearest Neighbors classifier. The performance of the system is measured in terms of efficiency and statistical agreement index Kappa. The results show that this approach provides acceptable results for this purpose, with the best efficiency around 89.3% for Asthenia (A).\",\"PeriodicalId\":359994,\"journal\":{\"name\":\"Symposium of Signals, Images and Artificial Vision - 2013: STSIVA - 2013\",\"volume\":\"601 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symposium of Signals, Images and Artificial Vision - 2013: STSIVA - 2013\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STSIVA.2013.6644930\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium of Signals, Images and Artificial Vision - 2013: STSIVA - 2013","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STSIVA.2013.6644930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic assessment of voice signals according to the GRBAS scale using modulation spectra, Mel frequency Cepstral Coefficients and Noise parameters
This paper presents a system for the automatic assessment of voice quality, according to the GRBAS scale, which considers different speech measures. The set of features includes the centroids and the energy content of different frequency bands in the modulation spectra of the recordings, Mel-frequency Cepstral Coefficients, Harmonics to Noise Ratio, Normalizes Noise Energy and Glottal to Noise Excitation Ratio. Additionally, with the aim of eliminate possible redundance in the information provided by the features, two different feature extraction techniques are applied, Principal Component Analysis and Linear Discriminant Analysis. The multiclass classification is done by means of K Nearest Neighbors classifier. The performance of the system is measured in terms of efficiency and statistical agreement index Kappa. The results show that this approach provides acceptable results for this purpose, with the best efficiency around 89.3% for Asthenia (A).