Panuwit Nantasri, E. Phaisangittisagul, Jessada Karnjana, Surasak Boonkla, S. Keerativittayanun, A. Rugchatjaroen, Sasiporn Usanavasin, T. Shinozaki
{"title":"基于mfccc均值及其衍生物的语音情感识别轻量人工神经网络","authors":"Panuwit Nantasri, E. Phaisangittisagul, Jessada Karnjana, Surasak Boonkla, S. Keerativittayanun, A. Rugchatjaroen, Sasiporn Usanavasin, T. Shinozaki","doi":"10.1109/ecti-con49241.2020.9158221","DOIUrl":null,"url":null,"abstract":"Due to the limitation of memory and computational power in the embedded system, this work proposes a novel approach to create a useful set of features for improving speech emotion recognition (SER) system. Typically, Mel Frequency Cepstral Coefficients ( MFCCs) i s w idely u sed a s f eatures of SER system. In order to reduce the number of parameters and computational burden in SER applications, average values of MFCCs that are concatenated with delta and delta-delta coefficients a re u sed a s t he f eatures f or a n a rtificial neural network model (ANN) in classification. The results demonstrate that the use of the proposed features are comparable to the state-of-the-art methods with 87.8% for the EmoDB database and 82.3% for the RAVDESS database, respectively. Moreover, the number of parameters used in the classification m odel has been significantly reduced.","PeriodicalId":371552,"journal":{"name":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives\",\"authors\":\"Panuwit Nantasri, E. Phaisangittisagul, Jessada Karnjana, Surasak Boonkla, S. Keerativittayanun, A. Rugchatjaroen, Sasiporn Usanavasin, T. Shinozaki\",\"doi\":\"10.1109/ecti-con49241.2020.9158221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the limitation of memory and computational power in the embedded system, this work proposes a novel approach to create a useful set of features for improving speech emotion recognition (SER) system. Typically, Mel Frequency Cepstral Coefficients ( MFCCs) i s w idely u sed a s f eatures of SER system. In order to reduce the number of parameters and computational burden in SER applications, average values of MFCCs that are concatenated with delta and delta-delta coefficients a re u sed a s t he f eatures f or a n a rtificial neural network model (ANN) in classification. The results demonstrate that the use of the proposed features are comparable to the state-of-the-art methods with 87.8% for the EmoDB database and 82.3% for the RAVDESS database, respectively. Moreover, the number of parameters used in the classification m odel has been significantly reduced.\",\"PeriodicalId\":371552,\"journal\":{\"name\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ecti-con49241.2020.9158221\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ecti-con49241.2020.9158221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives
Due to the limitation of memory and computational power in the embedded system, this work proposes a novel approach to create a useful set of features for improving speech emotion recognition (SER) system. Typically, Mel Frequency Cepstral Coefficients ( MFCCs) i s w idely u sed a s f eatures of SER system. In order to reduce the number of parameters and computational burden in SER applications, average values of MFCCs that are concatenated with delta and delta-delta coefficients a re u sed a s t he f eatures f or a n a rtificial neural network model (ANN) in classification. The results demonstrate that the use of the proposed features are comparable to the state-of-the-art methods with 87.8% for the EmoDB database and 82.3% for the RAVDESS database, respectively. Moreover, the number of parameters used in the classification m odel has been significantly reduced.