{"title":"Measuring the randomness of speech cues for emotion recognition","authors":"Seba Susan, Amandeep Kaur","doi":"10.1109/IC3.2017.8284298","DOIUrl":null,"url":null,"abstract":"Recognizing the emotional state of a human being from his/her speech is of great significance in modern surveillance systems. The Mel-Frequency Cepstral Coefficients (MFCC), pitch and energy are conventional speech cues that have been linked to emotions since long. In our work, we measure the randomness of these cues over time for discriminating between various human emotions. Entropy is used for measuring the randomness of the cues, computed from temporal histograms as well as temporal co-occurrence matrices. The direct values of MFCC, pitch and energy are not included and only their randomness is considered, since the actual values of MFCC, pitch and energy are often a characteristic of the speaker as much as of the emotion involved. The new set of entropy features for speech based emotion recognition is compared for its efficiency with the state-of-the- art methods on the benchmark SAVEE database. The higher classification accuracies demonstrate the efficiency of our approach.","PeriodicalId":147099,"journal":{"name":"2017 Tenth International Conference on Contemporary Computing (IC3)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Tenth International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2017.8284298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Recognizing the emotional state of a human being from his/her speech is of great significance in modern surveillance systems. The Mel-Frequency Cepstral Coefficients (MFCC), pitch and energy are conventional speech cues that have been linked to emotions since long. In our work, we measure the randomness of these cues over time for discriminating between various human emotions. Entropy is used for measuring the randomness of the cues, computed from temporal histograms as well as temporal co-occurrence matrices. The direct values of MFCC, pitch and energy are not included and only their randomness is considered, since the actual values of MFCC, pitch and energy are often a characteristic of the speaker as much as of the emotion involved. The new set of entropy features for speech based emotion recognition is compared for its efficiency with the state-of-the- art methods on the benchmark SAVEE database. The higher classification accuracies demonstrate the efficiency of our approach.