俄语语音情感自动识别的跨语言迁移：数据与趋势

IF 0.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Pub Date : 2025-08-26 DOI:10.3103/S000510552570058X

V. I. Lemaev, N. V. Lukashevich

{"title":"俄语语音情感自动识别的跨语言迁移：数据与趋势","authors":"V. I. Lemaev, N. V. Lukashevich","doi":"10.3103/S000510552570058X","DOIUrl":null,"url":null,"abstract":"<p>A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 3","pages":"166 - 176"},"PeriodicalIF":0.5000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Lingual Transfer for Russian Speech Emotion Automatic Recognition: Data and Trends\",\"authors\":\"V. I. Lemaev, N. V. Lukashevich\",\"doi\":\"10.3103/S000510552570058X\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data.</p>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":\"59 3\",\"pages\":\"166 - 176\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S000510552570058X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S000510552570058X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在语音情绪自动识别任务中，研究了语言和训练数据的差异对训练好的语音模型跨语迁移质量的影响。在训练阶段，以英语、波兰语、中文和日语为源语言，分别使用IEMOCAP、nEMO、ESD和JVNV情感语音数据集，模型本身为基于transformer架构的HuBERT语音模型。在相应数据集上训练的所有模型都在来自Dusha Russian情感语音数据集的缩短样本上进行测试。根据所获得的数据，考虑了选择不同语言进行语音模型训练及其随后向俄语迁移的主要趋势，并分析了数据集之间的差异，表明需要进一步开展高质量情感语音数据的收集和标记工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cross-Lingual Transfer for Russian Speech Emotion Automatic Recognition: Data and Trends

A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS COMPUTER SCIENCE, INFORMATION SYSTEMS-

自引率

40.00%

发文量

期刊介绍： Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.