{"title":"An improved characterization methodology to efficiently deal with the speech emotion recognition problem","authors":"Bryan E. Martínez, J. C. Jacobo","doi":"10.1109/ROPEC.2017.8261686","DOIUrl":null,"url":null,"abstract":"The speaker emotional state recognition task in human-computer interaction will be one of the most common in the future. This task is known as Speech Emotion Recognition (SER). Previous works have developed some characterizations which heavily relies on some sort of feature selection method in order to choose the best subset of features. To our knowledge, no effort has been invested in working out the original features with the idea to improve the classification. In this work, a methodology for feature preprocessing is presented. To this end, our characterization method uses a speech signal from which different characteristics, as well as statistics, are extracted. Then, these characteristics go through a preprocessing phase which will enhance the classification efficiency. After this, a two-stage classification scheme is used. In the first stage k-Means is used for clustering and then in the second stage, we use several standard classifiers. This strategy shows consistently across the classifiers, except for SVM, a superior classification rate (91–100%) than those reported in previous works.","PeriodicalId":260469,"journal":{"name":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC.2017.8261686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The speaker emotional state recognition task in human-computer interaction will be one of the most common in the future. This task is known as Speech Emotion Recognition (SER). Previous works have developed some characterizations which heavily relies on some sort of feature selection method in order to choose the best subset of features. To our knowledge, no effort has been invested in working out the original features with the idea to improve the classification. In this work, a methodology for feature preprocessing is presented. To this end, our characterization method uses a speech signal from which different characteristics, as well as statistics, are extracted. Then, these characteristics go through a preprocessing phase which will enhance the classification efficiency. After this, a two-stage classification scheme is used. In the first stage k-Means is used for clustering and then in the second stage, we use several standard classifiers. This strategy shows consistently across the classifiers, except for SVM, a superior classification rate (91–100%) than those reported in previous works.