Joan A. Sereno , Allard Jongman , Yue Wang , Paul Tupper , Dawn M. Behne , Jetic Gu , Haoyao Ruan
{"title":"Expectation of speech style improves audio-visual perception of English vowels","authors":"Joan A. Sereno , Allard Jongman , Yue Wang , Paul Tupper , Dawn M. Behne , Jetic Gu , Haoyao Ruan","doi":"10.1016/j.specom.2025.103243","DOIUrl":null,"url":null,"abstract":"<div><div>Speech perception is influenced by both signal-internal properties and signal-independent knowledge, including communicative expectations. This study investigates how these two factors interact, focusing on the role of speech style expectations. Specifically, we examine how prior knowledge about speech style (clear versus plain speech) affects word identification and speech style judgment. Native English perceivers were presented with English words containing tense versus lax vowels in either clear or plain speech, with trial conditions manipulating whether style prompts (presented immediately prior to the target word) were congruent or incongruent with the actual speech style. The stimuli were also presented in three input modalities: auditory (speaker voice), visual (speaker face), and audio-visual. Results show that prior knowledge of speech style improved accuracy in identifying style after the session when style information in the prompt and target word was consistent, particularly in auditory and audio-visual modalities. Additionally, as expected, clear speech enhanced word intelligibility compared to plain speech, with benefits more evident for tense vowels and in auditory and audio-visual contexts. These results demonstrate that congruent style prompts improve style identification accuracy by aligning with high-level expectations, while clear speech enhances word identification accuracy due to signal-internal modifications. Overall, the current findings suggest an interplay of processing sources of information which are both signal-driven and signal-independent, and that high-level signal-complementary information such as speech style is not separate from, but is embodied in, the signal.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"171 ","pages":"Article 103243"},"PeriodicalIF":3.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000585","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Speech perception is influenced by both signal-internal properties and signal-independent knowledge, including communicative expectations. This study investigates how these two factors interact, focusing on the role of speech style expectations. Specifically, we examine how prior knowledge about speech style (clear versus plain speech) affects word identification and speech style judgment. Native English perceivers were presented with English words containing tense versus lax vowels in either clear or plain speech, with trial conditions manipulating whether style prompts (presented immediately prior to the target word) were congruent or incongruent with the actual speech style. The stimuli were also presented in three input modalities: auditory (speaker voice), visual (speaker face), and audio-visual. Results show that prior knowledge of speech style improved accuracy in identifying style after the session when style information in the prompt and target word was consistent, particularly in auditory and audio-visual modalities. Additionally, as expected, clear speech enhanced word intelligibility compared to plain speech, with benefits more evident for tense vowels and in auditory and audio-visual contexts. These results demonstrate that congruent style prompts improve style identification accuracy by aligning with high-level expectations, while clear speech enhances word identification accuracy due to signal-internal modifications. Overall, the current findings suggest an interplay of processing sources of information which are both signal-driven and signal-independent, and that high-level signal-complementary information such as speech style is not separate from, but is embodied in, the signal.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.