利用多元函数数据分析增强语音情感识别

IF 4.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing Pub Date : 2023-08-25 DOI:10.3390/bdcc7030146

Matthieu Saumard

{"title":"利用多元函数数据分析增强语音情感识别","authors":"Matthieu Saumard","doi":"10.3390/bdcc7030146","DOIUrl":null,"url":null,"abstract":"Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis\",\"authors\":\"Matthieu Saumard\",\"doi\":\"10.3390/bdcc7030146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.\",\"PeriodicalId\":36397,\"journal\":{\"name\":\"Big Data and Cognitive Computing\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data and Cognitive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/bdcc7030146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc7030146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

摘要

语音情绪识别(SER)在人机交互和语音处理领域受到广泛关注。在本文中，我们提出了一种通过将Mel频率倒谱系数(MFCC)解释为多元功能数据对象来提高SER性能的新方法，该方法在保持高精度的同时加速了学习。为了将mfc作为功能数据处理，我们将其预处理为图像并应用大小调整技术。通过将mfccc表示为功能数据，我们利用了语音的时间动态，更有效地捕获了基本的情感线索。因此，这种增强极大地促进了SER方法的学习过程，而不会影响性能。随后，我们采用监督学习模型，特别是功能支持向量机(SVM)，直接对表示为功能数据的MFCC进行学习。这使得充分利用功能信息，允许更准确的情绪识别。所提出的方法在两个不同的数据库(EMO-DB和IEMOCAP)上进行了严格的评估，作为SER评估的基准。我们的方法在准确性方面显示出竞争力，展示了其在情绪识别方面的有效性。此外，我们的方法显著减少了学习时间，使其计算效率高，适用于实际应用。总之，我们将mfccc作为多元功能数据对象的新方法在SER任务中表现出卓越的性能，在学习过程中既提高了准确性，又节省了大量时间。这一进步在增强人机交互和实现更复杂的情感感知应用方面具有巨大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis

Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊