- Book学术

发布求助

文献互助智能选刊最新文献

IF 3.4 3区医学 Q1 ENGINEERING, MULTIDISCIPLINARY

Biomimetics Pub Date : 2025-03-10 DOI:10.3390/biomimetics10030167

Cevahir Parlak

{"title":"Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines.","authors":"Cevahir Parlak","doi":"10.3390/biomimetics10030167","DOIUrl":null,"url":null,"abstract":"<p><p>Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.</p>","PeriodicalId":8907,"journal":{"name":"Biomimetics","volume":"10 3","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11940085/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/biomimetics10030167","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

特征提取是语音情感识别应用中的一个关键阶段，滤波器组及其相关统计函数被广泛应用于这一目的。尽管 Mel 滤波器和 MFCC 取得了出色的效果，但它们并不能完美地模拟人耳的结构，因为它们使用了简化的机制来模拟人耳蜗结构的功能。梅尔滤波器系统并非人类听觉的完美代表，而只是抑制音高和低频成分的一种工程学捷径，在传统语音识别应用中用处不大。然而，语音情感识别分类与音高和低频成分特征密切相关。新定制的 CARFAC 24 模型是一个用于分析人类语音的复杂系统，其设计旨在最好地模拟人类耳蜗的功能。在本研究中，我们使用 CARFAC 24 系统进行语音情感识别，并通过使用时间分布卷积 LSTM 网络和支持向量机进行与说话者无关的研究，使用 ASED 和 NEMO 情感语音数据集将其与最先进的系统进行比较。研究结果表明，在语音情感识别应用中，CARFAC 24 是 Mel 和 MFCC 特征的重要替代品。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines.

Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊