{"title":"Emotion recognition from speech signal using mel-frequency cepstral coefficients","authors":"O. Korkmaz, A. Atasoy","doi":"10.1109/ELECO.2015.7394435","DOIUrl":null,"url":null,"abstract":"In this paper, mel-frequency cepstral coefficients are investigated for emotional content of speech signal. The features are extracted from spoken utterance. When these features are extracted, speech signal is divided small frames and each frame overlap a part of previous frame. The purpose of this overlap operation is to provide a smooth transition from one frame to the other and, to prevent information loss in the end of the frame. The length of frame and scroll time is important for emotion recognition applications. Also, we investigated the effects of different length frames and scroll times on the classification success of four emotions which are defined as happy, angry, neutral and sad. Those emotions were classified by using Support Vector Machine and k-Nearest Neighbors algorithms. In this study to determine the classification success, 10-Fold Cross Validation method was used and the maximum success rate was obtained as 98.7 %.","PeriodicalId":369687,"journal":{"name":"2015 9th International Conference on Electrical and Electronics Engineering (ELECO)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 9th International Conference on Electrical and Electronics Engineering (ELECO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ELECO.2015.7394435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In this paper, mel-frequency cepstral coefficients are investigated for emotional content of speech signal. The features are extracted from spoken utterance. When these features are extracted, speech signal is divided small frames and each frame overlap a part of previous frame. The purpose of this overlap operation is to provide a smooth transition from one frame to the other and, to prevent information loss in the end of the frame. The length of frame and scroll time is important for emotion recognition applications. Also, we investigated the effects of different length frames and scroll times on the classification success of four emotions which are defined as happy, angry, neutral and sad. Those emotions were classified by using Support Vector Machine and k-Nearest Neighbors algorithms. In this study to determine the classification success, 10-Fold Cross Validation method was used and the maximum success rate was obtained as 98.7 %.