Yan Zhao;Yuan Zong;Hailun Lian;Cheng Lu;Jingang Shi;Wenming Zheng
{"title":"面向特定领域的跨语料库语音情感识别方法","authors":"Yan Zhao;Yuan Zong;Hailun Lian;Cheng Lu;Jingang Shi;Wenming Zheng","doi":"10.1109/TCSS.2024.3483964","DOIUrl":null,"url":null,"abstract":"Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch between the training and testing speech samples, potentially degrading the performance of established SER methods. In this article, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledge-guided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier over-adaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotion-discriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. We evaluate our method through extensive cross-corpus SER experiments on three widely used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The proposed AKTLR achieves an average UAR of 42.12% across six tasks using the eGeMAPS feature set, outperforming many recent state-of-the-art transfer subspace learning and deep transfer learning methods. This demonstrates the effectiveness and superior performance of our approach. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 5","pages":"2130-2143"},"PeriodicalIF":4.5000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach\",\"authors\":\"Yan Zhao;Yuan Zong;Hailun Lian;Cheng Lu;Jingang Shi;Wenming Zheng\",\"doi\":\"10.1109/TCSS.2024.3483964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch between the training and testing speech samples, potentially degrading the performance of established SER methods. In this article, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledge-guided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier over-adaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotion-discriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. We evaluate our method through extensive cross-corpus SER experiments on three widely used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The proposed AKTLR achieves an average UAR of 42.12% across six tasks using the eGeMAPS feature set, outperforming many recent state-of-the-art transfer subspace learning and deep transfer learning methods. This demonstrates the effectiveness and superior performance of our approach. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"12 5\",\"pages\":\"2130-2143\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10750408/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10750408/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach
Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch between the training and testing speech samples, potentially degrading the performance of established SER methods. In this article, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledge-guided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier over-adaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotion-discriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. We evaluate our method through extensive cross-corpus SER experiments on three widely used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The proposed AKTLR achieves an average UAR of 42.12% across six tasks using the eGeMAPS feature set, outperforming many recent state-of-the-art transfer subspace learning and deep transfer learning methods. This demonstrates the effectiveness and superior performance of our approach. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.
期刊介绍:
IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.