{"title":"俄语语音情感自动识别的跨语言迁移:数据与趋势","authors":"V. I. Lemaev, N. V. Lukashevich","doi":"10.3103/S000510552570058X","DOIUrl":null,"url":null,"abstract":"<p>A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"59 3","pages":"166 - 176"},"PeriodicalIF":0.5000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Lingual Transfer for Russian Speech Emotion Automatic Recognition: Data and Trends\",\"authors\":\"V. I. Lemaev, N. V. Lukashevich\",\"doi\":\"10.3103/S000510552570058X\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data.</p>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":\"59 3\",\"pages\":\"166 - 176\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S000510552570058X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S000510552570058X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Cross-Lingual Transfer for Russian Speech Emotion Automatic Recognition: Data and Trends
A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.