{"title":"神经网络在情绪语音识别中的应用","authors":"M. Bojanic, V. Crnojevic, V. Delić","doi":"10.1109/NEUREL.2012.6420016","DOIUrl":null,"url":null,"abstract":"Emotional speech recognition (ESR) from the aspect of human-machine interaction (HCI) is a prerequisite for the framework of interacting partners within the HCI. This paper addresses the application of neural network (NN) in ESR. The performance of NN is tested using three different feature sets which are basis for ESR: prosodic features, spectral features and a set of their combination. The results of these feature sets are compared using several network topologies and two training algorithms. It has been shown that using joint prosodic-spectral feature set as input to three layer feed-forward NN trained with back-propagation algorithm has the best performance in 5-class emotional speech recognition task.","PeriodicalId":343718,"journal":{"name":"11th Symposium on Neural Network Applications in Electrical Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Application of neural networks in emotional speech recognition\",\"authors\":\"M. Bojanic, V. Crnojevic, V. Delić\",\"doi\":\"10.1109/NEUREL.2012.6420016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emotional speech recognition (ESR) from the aspect of human-machine interaction (HCI) is a prerequisite for the framework of interacting partners within the HCI. This paper addresses the application of neural network (NN) in ESR. The performance of NN is tested using three different feature sets which are basis for ESR: prosodic features, spectral features and a set of their combination. The results of these feature sets are compared using several network topologies and two training algorithms. It has been shown that using joint prosodic-spectral feature set as input to three layer feed-forward NN trained with back-propagation algorithm has the best performance in 5-class emotional speech recognition task.\",\"PeriodicalId\":343718,\"journal\":{\"name\":\"11th Symposium on Neural Network Applications in Electrical Engineering\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"11th Symposium on Neural Network Applications in Electrical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEUREL.2012.6420016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"11th Symposium on Neural Network Applications in Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEUREL.2012.6420016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Application of neural networks in emotional speech recognition
Emotional speech recognition (ESR) from the aspect of human-machine interaction (HCI) is a prerequisite for the framework of interacting partners within the HCI. This paper addresses the application of neural network (NN) in ESR. The performance of NN is tested using three different feature sets which are basis for ESR: prosodic features, spectral features and a set of their combination. The results of these feature sets are compared using several network topologies and two training algorithms. It has been shown that using joint prosodic-spectral feature set as input to three layer feed-forward NN trained with back-propagation algorithm has the best performance in 5-class emotional speech recognition task.