{"title":"基于多域声学特征选择和分类模型的语音情感自动检测系统","authors":"Nancy Semwal, Abhijeet Kumar, Sakthivel Narayanan","doi":"10.1109/ISBA.2017.7947681","DOIUrl":null,"url":null,"abstract":"Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping frames derived from the speech signal. A feature vector for every utterance is then constructed by analyzing the global statistics (mean, median, etc) of the extracted features over all frames. To select a subset of useful features from the full candidate feature vector, sequential backward selection (SBS) method is used with k-fold cross validation. Detection of emotion in the samples is done by classifying their respective feature vectors into classes, using either a pre-trained Support Vector Machine (SVM) model or Linear Discriminant Analysis (LDA) classifier. This approach is tested with two acted emotional databases - Berlin Database of Emotional Speech (EmoDB), and BML Emotion Database (RED). For multi class classification, accuracy of 80% for EmoDB and 73% for RED is achieved which are higher than or comparable to previous works on both the databases.","PeriodicalId":436086,"journal":{"name":"2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models\",\"authors\":\"Nancy Semwal, Abhijeet Kumar, Sakthivel Narayanan\",\"doi\":\"10.1109/ISBA.2017.7947681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping frames derived from the speech signal. A feature vector for every utterance is then constructed by analyzing the global statistics (mean, median, etc) of the extracted features over all frames. To select a subset of useful features from the full candidate feature vector, sequential backward selection (SBS) method is used with k-fold cross validation. Detection of emotion in the samples is done by classifying their respective feature vectors into classes, using either a pre-trained Support Vector Machine (SVM) model or Linear Discriminant Analysis (LDA) classifier. This approach is tested with two acted emotional databases - Berlin Database of Emotional Speech (EmoDB), and BML Emotion Database (RED). For multi class classification, accuracy of 80% for EmoDB and 73% for RED is achieved which are higher than or comparable to previous works on both the databases.\",\"PeriodicalId\":436086,\"journal\":{\"name\":\"2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISBA.2017.7947681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISBA.2017.7947681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models
Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping frames derived from the speech signal. A feature vector for every utterance is then constructed by analyzing the global statistics (mean, median, etc) of the extracted features over all frames. To select a subset of useful features from the full candidate feature vector, sequential backward selection (SBS) method is used with k-fold cross validation. Detection of emotion in the samples is done by classifying their respective feature vectors into classes, using either a pre-trained Support Vector Machine (SVM) model or Linear Discriminant Analysis (LDA) classifier. This approach is tested with two acted emotional databases - Berlin Database of Emotional Speech (EmoDB), and BML Emotion Database (RED). For multi class classification, accuracy of 80% for EmoDB and 73% for RED is achieved which are higher than or comparable to previous works on both the databases.