IF 3.4 3区 医学 Q1 ENGINEERING, MULTIDISCIPLINARY
Cevahir Parlak
{"title":"Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines.","authors":"Cevahir Parlak","doi":"10.3390/biomimetics10030167","DOIUrl":null,"url":null,"abstract":"<p><p>Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.</p>","PeriodicalId":8907,"journal":{"name":"Biomimetics","volume":"10 3","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11940085/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/biomimetics10030167","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

特征提取是语音情感识别应用中的一个关键阶段,滤波器组及其相关统计函数被广泛应用于这一目的。尽管 Mel 滤波器和 MFCC 取得了出色的效果,但它们并不能完美地模拟人耳的结构,因为它们使用了简化的机制来模拟人耳蜗结构的功能。梅尔滤波器系统并非人类听觉的完美代表,而只是抑制音高和低频成分的一种工程学捷径,在传统语音识别应用中用处不大。然而,语音情感识别分类与音高和低频成分特征密切相关。新定制的 CARFAC 24 模型是一个用于分析人类语音的复杂系统,其设计旨在最好地模拟人类耳蜗的功能。在本研究中,我们使用 CARFAC 24 系统进行语音情感识别,并通过使用时间分布卷积 LSTM 网络和支持向量机进行与说话者无关的研究,使用 ASED 和 NEMO 情感语音数据集将其与最先进的系统进行比较。研究结果表明,在语音情感识别应用中,CARFAC 24 是 Mel 和 MFCC 特征的重要替代品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines.

Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biomimetics
Biomimetics Biochemistry, Genetics and Molecular Biology-Biotechnology
CiteScore
3.50
自引率
11.10%
发文量
189
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信