{"title":"基于粒子群优化算法的情绪语音识别","authors":"Surjyo Narayana Panigrahi, H. Palo","doi":"10.1109/APSIT52773.2021.9641247","DOIUrl":null,"url":null,"abstract":"Last decade has witnessed several works in classifying speech emotions with a few excellent outcomes. The community has combined several feature extraction techniques to develop suitable identification system modelling with enhanced accuracy. Nevertheless, the hybridized set has increased the feature dimension due to the presence of redundant data. It results in an exponential increase in the storage space, computational complexity besides witnessing a slower system response. To alleviate these issues, the authors intend to explore Particle Swarm Optimization (PSO) to optimize a few extracted feature sets for improved Speech Emotion Recognition Accuracy (SERA). However, the much-debated and informative spectral features analyze the signal over the entire frequency range, hence are associated with irrelevant emotional information. This has led the authors to consider only the spectral features such as the Spectral Roll-off (SR), Spectral Flux (SF), Spectral Centroid (SC), and the formants extracted in some chosen sub-bands. The Radial Basis Function Neural Network (RBFNN) has been simulated to form the desired classification models using the derived feature sets. The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset has been considered for the intended analysis as it is in the English language and much work has been carried out using this dataset. The results reveal that the SER accuracy using the optimized technique has indeed improved as compared to the baseline techniques.","PeriodicalId":436488,"journal":{"name":"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emotional Speech Recognition Using Particle Swarm Optimization Algorithm\",\"authors\":\"Surjyo Narayana Panigrahi, H. Palo\",\"doi\":\"10.1109/APSIT52773.2021.9641247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Last decade has witnessed several works in classifying speech emotions with a few excellent outcomes. The community has combined several feature extraction techniques to develop suitable identification system modelling with enhanced accuracy. Nevertheless, the hybridized set has increased the feature dimension due to the presence of redundant data. It results in an exponential increase in the storage space, computational complexity besides witnessing a slower system response. To alleviate these issues, the authors intend to explore Particle Swarm Optimization (PSO) to optimize a few extracted feature sets for improved Speech Emotion Recognition Accuracy (SERA). However, the much-debated and informative spectral features analyze the signal over the entire frequency range, hence are associated with irrelevant emotional information. This has led the authors to consider only the spectral features such as the Spectral Roll-off (SR), Spectral Flux (SF), Spectral Centroid (SC), and the formants extracted in some chosen sub-bands. The Radial Basis Function Neural Network (RBFNN) has been simulated to form the desired classification models using the derived feature sets. The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset has been considered for the intended analysis as it is in the English language and much work has been carried out using this dataset. The results reveal that the SER accuracy using the optimized technique has indeed improved as compared to the baseline techniques.\",\"PeriodicalId\":436488,\"journal\":{\"name\":\"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)\",\"volume\":\"146 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIT52773.2021.9641247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIT52773.2021.9641247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Emotional Speech Recognition Using Particle Swarm Optimization Algorithm
Last decade has witnessed several works in classifying speech emotions with a few excellent outcomes. The community has combined several feature extraction techniques to develop suitable identification system modelling with enhanced accuracy. Nevertheless, the hybridized set has increased the feature dimension due to the presence of redundant data. It results in an exponential increase in the storage space, computational complexity besides witnessing a slower system response. To alleviate these issues, the authors intend to explore Particle Swarm Optimization (PSO) to optimize a few extracted feature sets for improved Speech Emotion Recognition Accuracy (SERA). However, the much-debated and informative spectral features analyze the signal over the entire frequency range, hence are associated with irrelevant emotional information. This has led the authors to consider only the spectral features such as the Spectral Roll-off (SR), Spectral Flux (SF), Spectral Centroid (SC), and the formants extracted in some chosen sub-bands. The Radial Basis Function Neural Network (RBFNN) has been simulated to form the desired classification models using the derived feature sets. The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset has been considered for the intended analysis as it is in the English language and much work has been carried out using this dataset. The results reveal that the SER accuracy using the optimized technique has indeed improved as compared to the baseline techniques.