基于粒子群优化算法的情绪语音识别

Surjyo Narayana Panigrahi, H. Palo
{"title":"基于粒子群优化算法的情绪语音识别","authors":"Surjyo Narayana Panigrahi, H. Palo","doi":"10.1109/APSIT52773.2021.9641247","DOIUrl":null,"url":null,"abstract":"Last decade has witnessed several works in classifying speech emotions with a few excellent outcomes. The community has combined several feature extraction techniques to develop suitable identification system modelling with enhanced accuracy. Nevertheless, the hybridized set has increased the feature dimension due to the presence of redundant data. It results in an exponential increase in the storage space, computational complexity besides witnessing a slower system response. To alleviate these issues, the authors intend to explore Particle Swarm Optimization (PSO) to optimize a few extracted feature sets for improved Speech Emotion Recognition Accuracy (SERA). However, the much-debated and informative spectral features analyze the signal over the entire frequency range, hence are associated with irrelevant emotional information. This has led the authors to consider only the spectral features such as the Spectral Roll-off (SR), Spectral Flux (SF), Spectral Centroid (SC), and the formants extracted in some chosen sub-bands. The Radial Basis Function Neural Network (RBFNN) has been simulated to form the desired classification models using the derived feature sets. The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset has been considered for the intended analysis as it is in the English language and much work has been carried out using this dataset. The results reveal that the SER accuracy using the optimized technique has indeed improved as compared to the baseline techniques.","PeriodicalId":436488,"journal":{"name":"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emotional Speech Recognition Using Particle Swarm Optimization Algorithm\",\"authors\":\"Surjyo Narayana Panigrahi, H. Palo\",\"doi\":\"10.1109/APSIT52773.2021.9641247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Last decade has witnessed several works in classifying speech emotions with a few excellent outcomes. The community has combined several feature extraction techniques to develop suitable identification system modelling with enhanced accuracy. Nevertheless, the hybridized set has increased the feature dimension due to the presence of redundant data. It results in an exponential increase in the storage space, computational complexity besides witnessing a slower system response. To alleviate these issues, the authors intend to explore Particle Swarm Optimization (PSO) to optimize a few extracted feature sets for improved Speech Emotion Recognition Accuracy (SERA). However, the much-debated and informative spectral features analyze the signal over the entire frequency range, hence are associated with irrelevant emotional information. This has led the authors to consider only the spectral features such as the Spectral Roll-off (SR), Spectral Flux (SF), Spectral Centroid (SC), and the formants extracted in some chosen sub-bands. The Radial Basis Function Neural Network (RBFNN) has been simulated to form the desired classification models using the derived feature sets. The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset has been considered for the intended analysis as it is in the English language and much work has been carried out using this dataset. The results reveal that the SER accuracy using the optimized technique has indeed improved as compared to the baseline techniques.\",\"PeriodicalId\":436488,\"journal\":{\"name\":\"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)\",\"volume\":\"146 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIT52773.2021.9641247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIT52773.2021.9641247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近十年来,人们在语音情绪分类方面进行了一些研究,并取得了一些很好的成果。该社区结合了几种特征提取技术来开发合适的识别系统模型,提高了准确性。然而,由于冗余数据的存在,杂交集增加了特征维数。除了系统响应变慢之外,它还导致存储空间和计算复杂度呈指数级增长。为了缓解这些问题,作者打算探索粒子群优化(PSO)来优化一些提取的特征集,以提高语音情感识别的准确性(SERA)。然而,备受争议和信息丰富的频谱特征分析了整个频率范围内的信号,因此与不相关的情感信息相关。这导致作者只考虑光谱特征,如谱滚降(SR)、谱通量(SF)、谱质心(SC)和在某些选定的子带中提取的共振峰。对径向基函数神经网络(RBFNN)进行了仿真,利用衍生的特征集形成所需的分类模型。萨里视听表达情感(SAVEE)数据集已被考虑用于预期的分析,因为它是英语语言,并且已经使用该数据集开展了许多工作。结果表明,与基线技术相比,使用优化技术的SER精度确实有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Emotional Speech Recognition Using Particle Swarm Optimization Algorithm
Last decade has witnessed several works in classifying speech emotions with a few excellent outcomes. The community has combined several feature extraction techniques to develop suitable identification system modelling with enhanced accuracy. Nevertheless, the hybridized set has increased the feature dimension due to the presence of redundant data. It results in an exponential increase in the storage space, computational complexity besides witnessing a slower system response. To alleviate these issues, the authors intend to explore Particle Swarm Optimization (PSO) to optimize a few extracted feature sets for improved Speech Emotion Recognition Accuracy (SERA). However, the much-debated and informative spectral features analyze the signal over the entire frequency range, hence are associated with irrelevant emotional information. This has led the authors to consider only the spectral features such as the Spectral Roll-off (SR), Spectral Flux (SF), Spectral Centroid (SC), and the formants extracted in some chosen sub-bands. The Radial Basis Function Neural Network (RBFNN) has been simulated to form the desired classification models using the derived feature sets. The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset has been considered for the intended analysis as it is in the English language and much work has been carried out using this dataset. The results reveal that the SER accuracy using the optimized technique has indeed improved as compared to the baseline techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信