基于混合四元数的回声状态网络双线性滤波器的语音情感识别

Fatemeh Daneshfar, S. J. Kabudian
{"title":"基于混合四元数的回声状态网络双线性滤波器的语音情感识别","authors":"Fatemeh Daneshfar, S. J. Kabudian","doi":"10.1109/ICSPIS54653.2021.9729337","DOIUrl":null,"url":null,"abstract":"Echo state network (ESN) is one of the efficient tools for displaying dynamic data. However, there are limitations to model high-dimensional data by ESNs. The most important limitation is the high amount of memory consumed due to their echo state and the linear output of the ESN network, which prevents the increase of reservoir units and the effective use of higher-order statistics of the features provided by its reservoir units. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB dataset. Comparisons show that the proposed QNESN network performs better than the simple ESN and most currently SER systems.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Speech Emotion Recognition Using a New Hybrid Quaternion-Based Echo State Network-Bilinear Filter\",\"authors\":\"Fatemeh Daneshfar, S. J. Kabudian\",\"doi\":\"10.1109/ICSPIS54653.2021.9729337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Echo state network (ESN) is one of the efficient tools for displaying dynamic data. However, there are limitations to model high-dimensional data by ESNs. The most important limitation is the high amount of memory consumed due to their echo state and the linear output of the ESN network, which prevents the increase of reservoir units and the effective use of higher-order statistics of the features provided by its reservoir units. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB dataset. Comparisons show that the proposed QNESN network performs better than the simple ESN and most currently SER systems.\",\"PeriodicalId\":286966,\"journal\":{\"name\":\"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPIS54653.2021.9729337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS54653.2021.9729337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

回声状态网络(ESN)是显示动态数据的有效工具之一。然而,通过esn对高维数据建模存在局限性。最重要的限制是由于回声状态和回声状态网络的线性输出所消耗的大量内存,这阻碍了存储单元的增加和有效利用其存储单元提供的特征的高阶统计量。本文提出了一种基于回声状态网络的新结构,利用四元数代数对网络数据进行简单的分割函数压缩,用多维双线性滤波器代替输出线性组合器。该滤波器将用于ESN输出层的非线性计算。此外,采用二维主成分分析(2dPCA)技术减少了传输到双线性滤波器的数据量。本文采用遗传算法对四元数非线性回声状态网络(QNESN)的系数和权值进行优化。为了证明所提模型与以往方法相比的有效性,在EMODB数据集上进行了语音情感识别(SER)实验。比较表明,所提出的QNESN网络比简单的ESN和目前大多数的SER系统性能更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Speech Emotion Recognition Using a New Hybrid Quaternion-Based Echo State Network-Bilinear Filter
Echo state network (ESN) is one of the efficient tools for displaying dynamic data. However, there are limitations to model high-dimensional data by ESNs. The most important limitation is the high amount of memory consumed due to their echo state and the linear output of the ESN network, which prevents the increase of reservoir units and the effective use of higher-order statistics of the features provided by its reservoir units. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB dataset. Comparisons show that the proposed QNESN network performs better than the simple ESN and most currently SER systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信