基于混合四元数的回声状态网络双线性滤波器的语音情感识别

2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS) Pub Date : 2021-12-29 DOI:10.1109/ICSPIS54653.2021.9729337

Fatemeh Daneshfar, S. J. Kabudian

{"title":"基于混合四元数的回声状态网络双线性滤波器的语音情感识别","authors":"Fatemeh Daneshfar, S. J. Kabudian","doi":"10.1109/ICSPIS54653.2021.9729337","DOIUrl":null,"url":null,"abstract":"Echo state network (ESN) is one of the efficient tools for displaying dynamic data. However, there are limitations to model high-dimensional data by ESNs. The most important limitation is the high amount of memory consumed due to their echo state and the linear output of the ESN network, which prevents the increase of reservoir units and the effective use of higher-order statistics of the features provided by its reservoir units. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB dataset. Comparisons show that the proposed QNESN network performs better than the simple ESN and most currently SER systems.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Speech Emotion Recognition Using a New Hybrid Quaternion-Based Echo State Network-Bilinear Filter\",\"authors\":\"Fatemeh Daneshfar, S. J. Kabudian\",\"doi\":\"10.1109/ICSPIS54653.2021.9729337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Echo state network (ESN) is one of the efficient tools for displaying dynamic data. However, there are limitations to model high-dimensional data by ESNs. The most important limitation is the high amount of memory consumed due to their echo state and the linear output of the ESN network, which prevents the increase of reservoir units and the effective use of higher-order statistics of the features provided by its reservoir units. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB dataset. Comparisons show that the proposed QNESN network performs better than the simple ESN and most currently SER systems.\",\"PeriodicalId\":286966,\"journal\":{\"name\":\"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPIS54653.2021.9729337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS54653.2021.9729337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

回声状态网络(ESN)是显示动态数据的有效工具之一。然而，通过esn对高维数据建模存在局限性。最重要的限制是由于回声状态和回声状态网络的线性输出所消耗的大量内存，这阻碍了存储单元的增加和有效利用其存储单元提供的特征的高阶统计量。本文提出了一种基于回声状态网络的新结构，利用四元数代数对网络数据进行简单的分割函数压缩，用多维双线性滤波器代替输出线性组合器。该滤波器将用于ESN输出层的非线性计算。此外，采用二维主成分分析(2dPCA)技术减少了传输到双线性滤波器的数据量。本文采用遗传算法对四元数非线性回声状态网络(QNESN)的系数和权值进行优化。为了证明所提模型与以往方法相比的有效性，在EMODB数据集上进行了语音情感识别(SER)实验。比较表明，所提出的QNESN网络比简单的ESN和目前大多数的SER系统性能更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Emotion Recognition Using a New Hybrid Quaternion-Based Echo State Network-Bilinear Filter

Echo state network (ESN) is one of the efficient tools for displaying dynamic data. However, there are limitations to model high-dimensional data by ESNs. The most important limitation is the high amount of memory consumed due to their echo state and the linear output of the ESN network, which prevents the increase of reservoir units and the effective use of higher-order statistics of the features provided by its reservoir units. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB dataset. Comparisons show that the proposed QNESN network performs better than the simple ESN and most currently SER systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)

自引率

0.00%

发文量