CENN: Capsule-enhanced neural network with innovative metrics for robust speech emotion recognition

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2024-09-07 DOI:10.1016/j.knosys.2024.112499

{"title":"CENN: Capsule-enhanced neural network with innovative metrics for robust speech emotion recognition","authors":"","doi":"10.1016/j.knosys.2024.112499","DOIUrl":null,"url":null,"abstract":"<div><p>Speech emotion recognition (SER) plays a pivotal role in enhancing Human-computer interaction (HCI) systems. This paper introduces a groundbreaking Capsule-enhanced neural network (CENN) that significantly advances the state of SER through a robust and reproducible deep learning framework. The CENN architecture seamlessly integrates advanced components, including Multi-head attention (MHA), residual module, and capsule module, which collectively enhance the model's capacity to capture both global and local features essential for precise emotion classification. A key contribution of this work is the development of a comprehensive reproducibility framework, featuring novel metrics: General learning reproducibility (GLR) and Correct learning reproducibility (CLR). These metrics, alongside their fractional and perfect variants, offer a multi-dimensional evaluation of the model's consistency and correctness across multiple executions, thereby ensuring the reliability and credibility of the results. To tackle the persistent challenge of overfitting in deep learning models, we propose an innovative overfitting metric that considers the intricate relationship between training and testing errors, model complexity, and data complexity. This metric, in conjunction with the newly introduced generalization and robustness metrics, provides a holistic assessment of the model's performance, guiding the application of regularization techniques to maintain generalizability and resilience. Extensive experiments conducted on benchmark SER datasets demonstrate that the CENN model not only surpasses existing approaches in terms of accuracy but also sets a new benchmark in reproducibility. This work establishes a new paradigm for deep learning model development in SER, underscoring the vital importance of reproducibility and offering a rigorous framework for future research.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095070512401133X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Speech emotion recognition (SER) plays a pivotal role in enhancing Human-computer interaction (HCI) systems. This paper introduces a groundbreaking Capsule-enhanced neural network (CENN) that significantly advances the state of SER through a robust and reproducible deep learning framework. The CENN architecture seamlessly integrates advanced components, including Multi-head attention (MHA), residual module, and capsule module, which collectively enhance the model's capacity to capture both global and local features essential for precise emotion classification. A key contribution of this work is the development of a comprehensive reproducibility framework, featuring novel metrics: General learning reproducibility (GLR) and Correct learning reproducibility (CLR). These metrics, alongside their fractional and perfect variants, offer a multi-dimensional evaluation of the model's consistency and correctness across multiple executions, thereby ensuring the reliability and credibility of the results. To tackle the persistent challenge of overfitting in deep learning models, we propose an innovative overfitting metric that considers the intricate relationship between training and testing errors, model complexity, and data complexity. This metric, in conjunction with the newly introduced generalization and robustness metrics, provides a holistic assessment of the model's performance, guiding the application of regularization techniques to maintain generalizability and resilience. Extensive experiments conducted on benchmark SER datasets demonstrate that the CENN model not only surpasses existing approaches in terms of accuracy but also sets a new benchmark in reproducibility. This work establishes a new paradigm for deep learning model development in SER, underscoring the vital importance of reproducibility and offering a rigorous framework for future research.

查看原文本刊更多论文

CENN：采用创新指标的胶囊增强型神经网络，用于稳健的语音情感识别

语音情感识别（SER）在增强人机交互（HCI）系统方面发挥着举足轻重的作用。本文介绍了一种开创性的胶囊增强神经网络（CENN），它通过一种稳健且可重现的深度学习框架，极大地推动了 SER 的发展。CENN 架构无缝集成了先进的组件，包括多头注意力（MHA）、残差模块和胶囊模块，这些组件共同增强了模型捕捉全局和局部特征的能力，这些特征对于精确的情感分类至关重要。这项工作的一个主要贡献是开发了一个全面的可重现性框架，其特点是采用了新颖的衡量标准：一般学习再现性（GLR）和正确学习再现性（CLR）。这些指标以及它们的分数和完美变体，对模型在多次执行中的一致性和正确性进行了多维度评估，从而确保了结果的可靠性和可信度。为了解决深度学习模型中长期存在的过拟合问题，我们提出了一种创新的过拟合度量，该度量考虑了训练和测试误差、模型复杂性和数据复杂性之间错综复杂的关系。该指标与新引入的泛化指标和鲁棒性指标相结合，可对模型的性能进行整体评估，指导正则化技术的应用，以保持泛化性和弹性。在基准 SER 数据集上进行的广泛实验表明，CENN 模型不仅在准确性方面超越了现有方法，而且在可重复性方面树立了新的标杆。这项工作为 SER 中的深度学习模型开发建立了一个新范例，强调了可重复性的极端重要性，并为未来研究提供了一个严格的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.