使用混合架构进行语音情感识别

Q3 Computer Science

International Journal of Computing Pub Date : 2024-04-01 DOI:10.47839/ijc.23.1.3430

Michael Norval, Zenghui Wang

{"title":"使用混合架构进行语音情感识别","authors":"Michael Norval, Zenghui Wang","doi":"10.47839/ijc.23.1.3430","DOIUrl":null,"url":null,"abstract":"The detection of human emotions from speech signals remains a challenging frontier in audio processing and human-computer interaction domains. This study introduces a novel approach to Speech Emotion Recognition (SER) using a Dendritic Layer combined with a Capsule Network (DendCaps). A Convolutional Neural Network (NN) and a Long Short-Time Neural Network (CLSTM) hybrid model are used to create a baseline which is then compared to the DendCap model. Integrating dendritic layers and capsule networks for speech emotion detection can harness the unique advantages of both architectures, potentially leading to more sophisticated and accurate models. Dendritic layers, inspired by the nonlinear processing properties of dendritic trees in biological neurons, can handle the intricate patterns and variabilities inherent in speech signals, while capsule networks, with their dynamic routing mechanisms, are adept at preserving hierarchical spatial relationships within the data, enabling the model to capture more refined emotional subtleties in human speech. The main motivation for using DendCaps is to bridge the gap between the capabilities of biological neural systems and artificial neural networks. This combination aims to capitalize on the hierarchical nature of speech data, where intricate patterns and dependencies can be better captured. Finally, two ensemble methods namely stacking and boosting are used for evaluating the CLSTM and DendCaps networks and the experimental results show that stacking of the CLSTM and DendCaps networks gives the superior result with a 75% accuracy.","PeriodicalId":37669,"journal":{"name":"International Journal of Computing","volume":"17 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Emotion Recognition using Hybrid Architectures\",\"authors\":\"Michael Norval, Zenghui Wang\",\"doi\":\"10.47839/ijc.23.1.3430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The detection of human emotions from speech signals remains a challenging frontier in audio processing and human-computer interaction domains. This study introduces a novel approach to Speech Emotion Recognition (SER) using a Dendritic Layer combined with a Capsule Network (DendCaps). A Convolutional Neural Network (NN) and a Long Short-Time Neural Network (CLSTM) hybrid model are used to create a baseline which is then compared to the DendCap model. Integrating dendritic layers and capsule networks for speech emotion detection can harness the unique advantages of both architectures, potentially leading to more sophisticated and accurate models. Dendritic layers, inspired by the nonlinear processing properties of dendritic trees in biological neurons, can handle the intricate patterns and variabilities inherent in speech signals, while capsule networks, with their dynamic routing mechanisms, are adept at preserving hierarchical spatial relationships within the data, enabling the model to capture more refined emotional subtleties in human speech. The main motivation for using DendCaps is to bridge the gap between the capabilities of biological neural systems and artificial neural networks. This combination aims to capitalize on the hierarchical nature of speech data, where intricate patterns and dependencies can be better captured. Finally, two ensemble methods namely stacking and boosting are used for evaluating the CLSTM and DendCaps networks and the experimental results show that stacking of the CLSTM and DendCaps networks gives the superior result with a 75% accuracy.\",\"PeriodicalId\":37669,\"journal\":{\"name\":\"International Journal of Computing\",\"volume\":\"17 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47839/ijc.23.1.3430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47839/ijc.23.1.3430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

从语音信号中检测人类情绪仍然是音频处理和人机交互领域的一个具有挑战性的前沿领域。本研究介绍了一种使用树突层与胶囊网络（DendCaps）相结合的语音情感识别（SER）新方法。卷积神经网络 (NN) 和长短时神经网络 (CLSTM) 混合模型被用于创建基线，然后与 DendCap 模型进行比较。将树突层和胶囊网络整合到语音情感检测中，可以利用这两种架构的独特优势，从而建立更复杂、更准确的模型。树突层的灵感来自于生物神经元树突树的非线性处理特性，可以处理语音信号中固有的复杂模式和变异性，而胶囊网络则具有动态路由机制，善于保留数据中的层次空间关系，使模型能够捕捉人类语音中更精细的情感微妙之处。使用 DendCaps 的主要动机是缩小生物神经系统与人工神经网络之间的差距。这种结合旨在利用语音数据的层次性，因为在语音数据中，错综复杂的模式和依赖关系可以被更好地捕捉。实验结果表明，CLSTM 和 DendCaps 网络的叠加效果更好，准确率达到 75%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Emotion Recognition using Hybrid Architectures

The detection of human emotions from speech signals remains a challenging frontier in audio processing and human-computer interaction domains. This study introduces a novel approach to Speech Emotion Recognition (SER) using a Dendritic Layer combined with a Capsule Network (DendCaps). A Convolutional Neural Network (NN) and a Long Short-Time Neural Network (CLSTM) hybrid model are used to create a baseline which is then compared to the DendCap model. Integrating dendritic layers and capsule networks for speech emotion detection can harness the unique advantages of both architectures, potentially leading to more sophisticated and accurate models. Dendritic layers, inspired by the nonlinear processing properties of dendritic trees in biological neurons, can handle the intricate patterns and variabilities inherent in speech signals, while capsule networks, with their dynamic routing mechanisms, are adept at preserving hierarchical spatial relationships within the data, enabling the model to capture more refined emotional subtleties in human speech. The main motivation for using DendCaps is to bridge the gap between the capabilities of biological neural systems and artificial neural networks. This combination aims to capitalize on the hierarchical nature of speech data, where intricate patterns and dependencies can be better captured. Finally, two ensemble methods namely stacking and boosting are used for evaluating the CLSTM and DendCaps networks and the experimental results show that stacking of the CLSTM and DendCaps networks gives the superior result with a 75% accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Computing Computer Science-Computer Science (miscellaneous)

CiteScore

2.20

自引率

0.00%

发文量

期刊介绍： The International Journal of Computing Journal was established in 2002 on the base of Branch Research Laboratory for Automated Systems and Networks, since 2005 it’s renamed as Research Institute of Intelligent Computer Systems. A goal of the Journal is to publish papers with the novel results in Computing Science and Computer Engineering and Information Technologies and Software Engineering and Information Systems within the Journal topics. The official language of the Journal is English; also papers abstracts in both Ukrainian and Russian languages are published there. The issues of the Journal are published quarterly. The Editorial Board consists of about 30 recognized worldwide scientists.