面向特定领域的跨语料库语音情感识别方法

IF 4.5 2区 计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS
Yan Zhao;Yuan Zong;Hailun Lian;Cheng Lu;Jingang Shi;Wenming Zheng
{"title":"面向特定领域的跨语料库语音情感识别方法","authors":"Yan Zhao;Yuan Zong;Hailun Lian;Cheng Lu;Jingang Shi;Wenming Zheng","doi":"10.1109/TCSS.2024.3483964","DOIUrl":null,"url":null,"abstract":"Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch between the training and testing speech samples, potentially degrading the performance of established SER methods. In this article, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledge-guided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier over-adaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotion-discriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. We evaluate our method through extensive cross-corpus SER experiments on three widely used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The proposed AKTLR achieves an average UAR of 42.12% across six tasks using the eGeMAPS feature set, outperforming many recent state-of-the-art transfer subspace learning and deep transfer learning methods. This demonstrates the effectiveness and superior performance of our approach. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"2130-2143"},"PeriodicalIF":4.5000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach\",\"authors\":\"Yan Zhao;Yuan Zong;Hailun Lian;Cheng Lu;Jingang Shi;Wenming Zheng\",\"doi\":\"10.1109/TCSS.2024.3483964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch between the training and testing speech samples, potentially degrading the performance of established SER methods. In this article, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledge-guided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier over-adaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotion-discriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. We evaluate our method through extensive cross-corpus SER experiments on three widely used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The proposed AKTLR achieves an average UAR of 42.12% across six tasks using the eGeMAPS feature set, outperforming many recent state-of-the-art transfer subspace learning and deep transfer learning methods. This demonstrates the effectiveness and superior performance of our approach. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"12 5\",\"pages\":\"2130-2143\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10750408/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10750408/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
引用次数: 0

摘要

跨语料库语音情感识别由于训练和测试语音样本之间的特征分布不匹配而面临挑战,可能会降低现有语音情感识别方法的性能。在本文中,我们提出了一种新的迁移子空间学习方法——声学知识引导迁移线性回归(AKTLR)来解决这一挑战。现有的方法往往忽略了与SER相关的特定领域知识,并简单地将跨语料库SER视为通用的迁移学习任务,而我们的AKTLR方法建立在一个设计良好的声学知识引导的双稀疏性约束机制之上。这种机制强调了最小化声学参数特征集的潜力,以减轻分类器的过度适应,这是经验验证的声学知识在SER中,与使用大型特征集相比,在跨语料库SER任务中实现了更好的泛化。通过这种机制,我们将一个简单的传递线性回归模型推广到AKTLR。这个扩展利用其充分的能力,寻求情感判别和语料库不变的特征,从建立的声学参数特征集用于描述语音信号跨越两个尺度:贡献声学参数组和每个贡献组内的组成元素。我们在三个广泛使用的语音情感语料库EmoDB、eNTERFACE和CASIA上进行了广泛的跨语料库SER实验,以评估我们的方法。使用eGeMAPS特征集,提出的AKTLR在六个任务中实现了42.12%的平均UAR,优于许多最近最先进的迁移子空间学习和深度迁移学习方法。这证明了我们的方法的有效性和优越的性能。此外,我们的工作提供了实验证据,支持将特定领域知识纳入迁移学习模型以解决跨语料库SER任务的可行性和优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach
Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch between the training and testing speech samples, potentially degrading the performance of established SER methods. In this article, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledge-guided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier over-adaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotion-discriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. We evaluate our method through extensive cross-corpus SER experiments on three widely used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The proposed AKTLR achieves an average UAR of 42.12% across six tasks using the eGeMAPS feature set, outperforming many recent state-of-the-art transfer subspace learning and deep transfer learning methods. This demonstrates the effectiveness and superior performance of our approach. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Computational Social Systems
IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)
CiteScore
10.00
自引率
20.00%
发文量
316
期刊介绍: IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信