塞尔维亚语情绪言语中惊讶力调节对言语持续时间的影响

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-04-11 DOI:10.1016/j.csl.2025.101803

Jelena Lazić, Sanja Vujnović

{"title":"塞尔维亚语情绪言语中惊讶力调节对言语持续时间的影响","authors":"Jelena Lazić, Sanja Vujnović","doi":"10.1016/j.csl.2025.101803","DOIUrl":null,"url":null,"abstract":"<div><div>Emotional speech analysis has been a topic of interest across multiple disciplines. However, it remains a challenging task due to its complexity and multimodality. Computer systems still struggle with robustness when dealing with emotional speech. Despite being a difficult area of research, the wide range of potential applications, especially nowadays in the era of intelligent agents and humanoid systems, has led to increased interest in this field. With the development of machine learning models, a variety of novel techniques have emerged, including pre-trained language models. In this work, we used these models to research emotional speech analysis from an information-theory perspective. Specifically, we focused on analyzing language processing difficulty, measured by word-level spoken time duration, and its relation to information distribution over speech, measured by word-level surprisal values. We analyzed a dataset of audio recordings in the low-resourced Serbian language, recorded under five different speakers’ emotional states. Seven state-of-the-art machine learning language models were employed to estimate surprisal values, which were then used as predictive parameters for word-level spoken time duration. Our results supported related studies in the English language and indicated that machine learning-estimated surprisal values may be good predictors of speech parameters in Serbian. Furthermore, modulating the power of surprisal values led to different outcomes for various speakers’ emotional states. This suggests potential differences in the role of surprisal values in speech production under different emotional conditions.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"94 ","pages":"Article 101803"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Influence of the surprisal power adjustment on spoken word duration in emotional speech in Serbian\",\"authors\":\"Jelena Lazić, Sanja Vujnović\",\"doi\":\"10.1016/j.csl.2025.101803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emotional speech analysis has been a topic of interest across multiple disciplines. However, it remains a challenging task due to its complexity and multimodality. Computer systems still struggle with robustness when dealing with emotional speech. Despite being a difficult area of research, the wide range of potential applications, especially nowadays in the era of intelligent agents and humanoid systems, has led to increased interest in this field. With the development of machine learning models, a variety of novel techniques have emerged, including pre-trained language models. In this work, we used these models to research emotional speech analysis from an information-theory perspective. Specifically, we focused on analyzing language processing difficulty, measured by word-level spoken time duration, and its relation to information distribution over speech, measured by word-level surprisal values. We analyzed a dataset of audio recordings in the low-resourced Serbian language, recorded under five different speakers’ emotional states. Seven state-of-the-art machine learning language models were employed to estimate surprisal values, which were then used as predictive parameters for word-level spoken time duration. Our results supported related studies in the English language and indicated that machine learning-estimated surprisal values may be good predictors of speech parameters in Serbian. Furthermore, modulating the power of surprisal values led to different outcomes for various speakers’ emotional states. This suggests potential differences in the role of surprisal values in speech production under different emotional conditions.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"94 \",\"pages\":\"Article 101803\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230825000282\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000282","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

情绪言语分析一直是多个学科感兴趣的话题。然而，由于其复杂性和多模性，它仍然是一项具有挑战性的任务。计算机系统在处理情绪性言语时仍然难以做到健壮。尽管这是一个困难的研究领域，但广泛的潜在应用，特别是在当今智能代理和类人系统的时代，已经导致人们对这一领域的兴趣增加。随着机器学习模型的发展，出现了各种各样的新技术，包括预训练语言模型。在这项工作中，我们使用这些模型从信息论的角度研究情绪言语分析。具体来说，我们侧重于分析语言处理难度（以单词级口语持续时间衡量）及其与语音信息分布（以单词级惊讶值衡量）的关系。我们分析了资源匮乏的塞尔维亚语的录音数据集，记录了五个不同说话者的情绪状态。七个最先进的机器学习语言模型被用来估计惊讶值，然后将其用作单词水平口语持续时间的预测参数。我们的结果支持英语语言的相关研究，并表明机器学习估计的惊讶值可能是塞尔维亚语语音参数的良好预测因子。此外，调节惊讶值的力量会导致不同说话者情绪状态的不同结果。这表明在不同情绪条件下，惊讶值在言语产生中的作用可能存在差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Influence of the surprisal power adjustment on spoken word duration in emotional speech in Serbian

Emotional speech analysis has been a topic of interest across multiple disciplines. However, it remains a challenging task due to its complexity and multimodality. Computer systems still struggle with robustness when dealing with emotional speech. Despite being a difficult area of research, the wide range of potential applications, especially nowadays in the era of intelligent agents and humanoid systems, has led to increased interest in this field. With the development of machine learning models, a variety of novel techniques have emerged, including pre-trained language models. In this work, we used these models to research emotional speech analysis from an information-theory perspective. Specifically, we focused on analyzing language processing difficulty, measured by word-level spoken time duration, and its relation to information distribution over speech, measured by word-level surprisal values. We analyzed a dataset of audio recordings in the low-resourced Serbian language, recorded under five different speakers’ emotional states. Seven state-of-the-art machine learning language models were employed to estimate surprisal values, which were then used as predictive parameters for word-level spoken time duration. Our results supported related studies in the English language and indicated that machine learning-estimated surprisal values may be good predictors of speech parameters in Serbian. Furthermore, modulating the power of surprisal values led to different outcomes for various speakers’ emotional states. This suggests potential differences in the role of surprisal values in speech production under different emotional conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.