Analysis and Research on Spectrogram-Based Emotional Speech Signal Augmentation Algorithm.

IF 2.1 3区 物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY
Entropy Pub Date : 2025-06-15 DOI:10.3390/e27060640
Huawei Tao, Sixian Li, Xuemei Wang, Binkun Liu, Shuailong Zheng
{"title":"Analysis and Research on Spectrogram-Based Emotional Speech Signal Augmentation Algorithm.","authors":"Huawei Tao, Sixian Li, Xuemei Wang, Binkun Liu, Shuailong Zheng","doi":"10.3390/e27060640","DOIUrl":null,"url":null,"abstract":"<p><p>Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may distort emotional labels, thereby reducing the performance of models. To address this issue, in this paper we systematically evaluate the influence of common data augmentation algorithms on emotion recognition from three dimensions: (1) we design subjective auditory experiments to intuitively demonstrate the impact of augmentation algorithms on the emotional expression of speech; (2) we jointly extract multi-dimensional features from spectrograms based on the Librosa library and analyze the impact of data augmentation algorithms on the spectral features of speech signals through heatmap visualization; and (3) we objectively evaluate the recognition performance of the model by means of indicators such as cross-entropy loss and introduce statistical significance analysis to verify the effectiveness of the augmentation algorithms. The experimental results show that \"time stretching\" may distort speech features, affect the attribution of emotional labels, and significantly reduce the model's accuracy. In contrast, \"reverberation\" (RIR) and \"resampling\" within a limited range have the least impact on emotional data, enhancing the diversity of samples. Moreover, their combination can increase accuracy by up to 7.1%, providing a basis for optimizing data augmentation strategies.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 6","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12191602/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27060640","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may distort emotional labels, thereby reducing the performance of models. To address this issue, in this paper we systematically evaluate the influence of common data augmentation algorithms on emotion recognition from three dimensions: (1) we design subjective auditory experiments to intuitively demonstrate the impact of augmentation algorithms on the emotional expression of speech; (2) we jointly extract multi-dimensional features from spectrograms based on the Librosa library and analyze the impact of data augmentation algorithms on the spectral features of speech signals through heatmap visualization; and (3) we objectively evaluate the recognition performance of the model by means of indicators such as cross-entropy loss and introduce statistical significance analysis to verify the effectiveness of the augmentation algorithms. The experimental results show that "time stretching" may distort speech features, affect the attribution of emotional labels, and significantly reduce the model's accuracy. In contrast, "reverberation" (RIR) and "resampling" within a limited range have the least impact on emotional data, enhancing the diversity of samples. Moreover, their combination can increase accuracy by up to 7.1%, providing a basis for optimizing data augmentation strategies.

基于谱图的情绪语音信号增强算法分析与研究。
数据增强技术被广泛应用于语音情感识别,以增加数据的多样性和提高模型的性能。然而,现有的研究尚未深入探讨这些数据增强技术对情绪数据的影响。不适当的增强算法可能会扭曲情感标签,从而降低模型的性能。为了解决这一问题,本文从三个维度系统评估了常用数据增强算法对情绪识别的影响:(1)设计主观听觉实验,直观地展示增强算法对语音情感表达的影响;(2)基于Librosa库,共同从频谱图中提取多维特征,并通过热图可视化分析数据增强算法对语音信号频谱特征的影响;(3)通过交叉熵损失等指标客观评价模型的识别性能,并引入统计显著性分析来验证增强算法的有效性。实验结果表明,“时间拉伸”会扭曲语音特征,影响情绪标签的归因,显著降低模型的准确率。相比之下,在有限范围内的“混响”(RIR)和“重采样”对情绪数据的影响最小,增强了样本的多样性。此外,它们的组合可以使准确率提高7.1%,为优化数据增强策略提供了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Entropy
Entropy PHYSICS, MULTIDISCIPLINARY-
CiteScore
4.90
自引率
11.10%
发文量
1580
审稿时长
21.05 days
期刊介绍: Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信