{"title":"基于小波近似系数的语音情感识别滤波方法","authors":"Ravi, Sachin Taran","doi":"10.1016/j.measurement.2025.118165","DOIUrl":null,"url":null,"abstract":"<div><div>The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.</div></div>","PeriodicalId":18349,"journal":{"name":"Measurement","volume":"256 ","pages":"Article 118165"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A filtering approach for speech emotion recognition using wavelet approximation coefficient\",\"authors\":\"Ravi, Sachin Taran\",\"doi\":\"10.1016/j.measurement.2025.118165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.</div></div>\",\"PeriodicalId\":18349,\"journal\":{\"name\":\"Measurement\",\"volume\":\"256 \",\"pages\":\"Article 118165\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0263224125015246\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0263224125015246","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
A filtering approach for speech emotion recognition using wavelet approximation coefficient
The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.
期刊介绍:
Contributions are invited on novel achievements in all fields of measurement and instrumentation science and technology. Authors are encouraged to submit novel material, whose ultimate goal is an advancement in the state of the art of: measurement and metrology fundamentals, sensors, measurement instruments, measurement and estimation techniques, measurement data processing and fusion algorithms, evaluation procedures and methodologies for plants and industrial processes, performance analysis of systems, processes and algorithms, mathematical models for measurement-oriented purposes, distributed measurement systems in a connected world.