基于小波近似系数的语音情感识别滤波方法

IF 5.2 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Ravi, Sachin Taran
{"title":"基于小波近似系数的语音情感识别滤波方法","authors":"Ravi,&nbsp;Sachin Taran","doi":"10.1016/j.measurement.2025.118165","DOIUrl":null,"url":null,"abstract":"<div><div>The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.</div></div>","PeriodicalId":18349,"journal":{"name":"Measurement","volume":"256 ","pages":"Article 118165"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A filtering approach for speech emotion recognition using wavelet approximation coefficient\",\"authors\":\"Ravi,&nbsp;Sachin Taran\",\"doi\":\"10.1016/j.measurement.2025.118165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.</div></div>\",\"PeriodicalId\":18349,\"journal\":{\"name\":\"Measurement\",\"volume\":\"256 \",\"pages\":\"Article 118165\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0263224125015246\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0263224125015246","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

从人类语音信号中理解情感是最具吸引力的研究领域之一。然而,由于语言和口音对演讲的影响,这也很有挑战性。该框架提出了一种基于小波逼近方法的语音情感识别解决方案。本文提出了小波近似语音情感识别(WaSER)模型。探讨了“db4”小波来计算近似系数。在初始阶段,使用小波逼近和能量阈值去除原始语音信号的无声段,得到滤波后的信号。在小波选择和阈值优化方面,进行了各种实验。然后从滤波后的信号中提取一组韵律和频谱特征。使用ReliefF算法选择潜在特征,并利用这些最优特征进行进一步处理。烧蚀实验也证明了所提出的WaSER模型的潜力。WaSER模型在各种基于语言的数据集上进行了测试,包括RAVDESS, EMOVO, Emo-DB和ShEMO,在ShEMO数据集上达到了94.6%的最高准确率。消融实验表明,过滤语音显著提高了所有数据集的准确性,并保持了准确性和灵敏度之间的平衡。此外,与现有技术相比,所提出的模型大大缩短了响应时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A filtering approach for speech emotion recognition using wavelet approximation coefficient
The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Measurement
Measurement 工程技术-工程:综合
CiteScore
10.20
自引率
12.50%
发文量
1589
审稿时长
12.1 months
期刊介绍: Contributions are invited on novel achievements in all fields of measurement and instrumentation science and technology. Authors are encouraged to submit novel material, whose ultimate goal is an advancement in the state of the art of: measurement and metrology fundamentals, sensors, measurement instruments, measurement and estimation techniques, measurement data processing and fusion algorithms, evaluation procedures and methodologies for plants and industrial processes, performance analysis of systems, processes and algorithms, mathematical models for measurement-oriented purposes, distributed measurement systems in a connected world.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信