{"title":"优化频率范围的MFCC功能:情感识别的重要步骤","authors":"Subhasmita Sahoo, A. Routray","doi":"10.1109/ICSMB.2016.7915112","DOIUrl":null,"url":null,"abstract":"One of the major challenge in human emotion recognition is extraction of features containing maximum prosodic information. The accuracy of entire emotion detection system eventually relies upon the efficiency of the selected feature. When it comes to identifying emotions from voice, ambiguity in detection can never be completely avoided due to several reasons. Exclusion of redundant information to reduce confusion in recognizing emotions is quite challenging. The primary objective of this work is to improve the accuracy of existing emotion recognition method that uses Mel frequency Cepstral Coefficient (MFCC) feature. In this work, an additional step has been introduced to the method to make it more efficient for recognizing emotions from voice. Instead of taking the whole signal frequency range for filter bank analysis in MFCC computation, it has been suggested to optimize the analysis frequency range for maximum accuracy. The proposed method has been tested on two standard speech emotion databases: Berlin Emo-DB database [1] and Assamese database [2]. The addition of this extra step has been found to be increasing speaker-independent emotion recognition accuracy by 15% for Assamese database and around 25% for Berlin database.","PeriodicalId":231556,"journal":{"name":"2016 International Conference on Systems in Medicine and Biology (ICSMB)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"MFCC feature with optimized frequency range: An essential step for emotion recognition\",\"authors\":\"Subhasmita Sahoo, A. Routray\",\"doi\":\"10.1109/ICSMB.2016.7915112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the major challenge in human emotion recognition is extraction of features containing maximum prosodic information. The accuracy of entire emotion detection system eventually relies upon the efficiency of the selected feature. When it comes to identifying emotions from voice, ambiguity in detection can never be completely avoided due to several reasons. Exclusion of redundant information to reduce confusion in recognizing emotions is quite challenging. The primary objective of this work is to improve the accuracy of existing emotion recognition method that uses Mel frequency Cepstral Coefficient (MFCC) feature. In this work, an additional step has been introduced to the method to make it more efficient for recognizing emotions from voice. Instead of taking the whole signal frequency range for filter bank analysis in MFCC computation, it has been suggested to optimize the analysis frequency range for maximum accuracy. The proposed method has been tested on two standard speech emotion databases: Berlin Emo-DB database [1] and Assamese database [2]. The addition of this extra step has been found to be increasing speaker-independent emotion recognition accuracy by 15% for Assamese database and around 25% for Berlin database.\",\"PeriodicalId\":231556,\"journal\":{\"name\":\"2016 International Conference on Systems in Medicine and Biology (ICSMB)\",\"volume\":\"172 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Systems in Medicine and Biology (ICSMB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSMB.2016.7915112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Systems in Medicine and Biology (ICSMB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSMB.2016.7915112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MFCC feature with optimized frequency range: An essential step for emotion recognition
One of the major challenge in human emotion recognition is extraction of features containing maximum prosodic information. The accuracy of entire emotion detection system eventually relies upon the efficiency of the selected feature. When it comes to identifying emotions from voice, ambiguity in detection can never be completely avoided due to several reasons. Exclusion of redundant information to reduce confusion in recognizing emotions is quite challenging. The primary objective of this work is to improve the accuracy of existing emotion recognition method that uses Mel frequency Cepstral Coefficient (MFCC) feature. In this work, an additional step has been introduced to the method to make it more efficient for recognizing emotions from voice. Instead of taking the whole signal frequency range for filter bank analysis in MFCC computation, it has been suggested to optimize the analysis frequency range for maximum accuracy. The proposed method has been tested on two standard speech emotion databases: Berlin Emo-DB database [1] and Assamese database [2]. The addition of this extra step has been found to be increasing speaker-independent emotion recognition accuracy by 15% for Assamese database and around 25% for Berlin database.