Speech Communication最新文献

筛选
英文 中文
Towards robust heart failure detection in digital telephony environments by utilizing transformer-based codec inversion 利用基于变压器的编解码器反转在数字电话环境中实现鲁棒心力衰竭检测
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-07-15 DOI: 10.1016/j.specom.2025.103279
Saska Tirronen , Farhad Javanmardi , Hilla Pohjalainen , Sudarsana Reddy Kadiri , Kiran Reddy Mittapalle , Pyry Helkkula , Kasimir Kaitue , Mikko Minkkinen , Heli Tolppanen , Tuomo Nieminen , Paavo Alku
{"title":"Towards robust heart failure detection in digital telephony environments by utilizing transformer-based codec inversion","authors":"Saska Tirronen ,&nbsp;Farhad Javanmardi ,&nbsp;Hilla Pohjalainen ,&nbsp;Sudarsana Reddy Kadiri ,&nbsp;Kiran Reddy Mittapalle ,&nbsp;Pyry Helkkula ,&nbsp;Kasimir Kaitue ,&nbsp;Mikko Minkkinen ,&nbsp;Heli Tolppanen ,&nbsp;Tuomo Nieminen ,&nbsp;Paavo Alku","doi":"10.1016/j.specom.2025.103279","DOIUrl":"10.1016/j.specom.2025.103279","url":null,"abstract":"<div><div>This study introduces the Codec Transformer Network (CTN) to enhance the reliability of automatic heart failure (HF) detection from coded telephone speech by addressing codec-related challenges in digital telephony. The study specifically addresses the codec mismatch between training and inference in HF detection. CTN is designed to map the mel-spectrogram representations of encoded speech signals back to their original, non-encoded forms, thereby recovering HF-related discriminative information. The effectiveness of CTN is demonstrated in conjunction with three HF detectors, based on Support Vector Machine, Random Forest, and K-Nearest Neighbors classifiers. The results show that CTN effectively retrieves the discriminative information between patients and controls, and performs comparably to or better than a baseline approach, based on multi-condition training.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103279"},"PeriodicalIF":2.4,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal speech emotion recognition via modality constraint with hierarchical bottleneck feature fusion 基于层次化瓶颈特征融合的多模态语音情感识别
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-07-10 DOI: 10.1016/j.specom.2025.103278
Ying Wang , Jianjun Lei , Xiangwei Zhu , Tao Zhang
{"title":"Multimodal speech emotion recognition via modality constraint with hierarchical bottleneck feature fusion","authors":"Ying Wang ,&nbsp;Jianjun Lei ,&nbsp;Xiangwei Zhu ,&nbsp;Tao Zhang","doi":"10.1016/j.specom.2025.103278","DOIUrl":"10.1016/j.specom.2025.103278","url":null,"abstract":"<div><div>Multimodal can combine different channels of information simultaneously to improve the modeling capabilities. Many recent studies focus on overcoming challenges arising from inter-modal conflicts and incomplete intra-modal learning for multimodal architectures. In this paper, we propose a scalable multimodal speech emotion recognition (SER) framework incorporating a hierarchical bottleneck feature (HBF) fusion approach. Furthermore, we design an intra-modal and inter-modal contrastive learning mechanism that enables self-supervised calibration of both modality-specific and cross-modal feature distributions. This approach achieves adaptive feature fusion and alignment while significantly reducing reliance on rigid feature alignment constraints. Meanwhile, by restricting the learning path of modality encoders, we design a modality representation constraint (MRC) method to mitigate conflicts between modalities. Furthermore, we present a modality bargaining (MB) strategy that facilitates learning within modalities through a mechanism of mutual bargaining and balance, which can avoid falling into suboptimal modal representations by allowing the learning of different modalities to perform alternating interchangeability. Our aggressive and disciplined training strategies enable our architecture to perform well on some multimodal emotion datasets such as CREMA-D, IEMOCAP, and MELD. Finally, we also conduct extensive experiments to demonstrate the effectiveness of our proposed architecture on various modal encoders and different modal combination methods.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103278"},"PeriodicalIF":2.4,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-native (Czech and Russian L1) auditor assessments of some English suprasegmental features: Prominence and pitch accents 非母语(捷克语和俄语)听者对一些英语超分段特征的评估:突出音和音高口音
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-07-10 DOI: 10.1016/j.specom.2025.103281
Alexey Tymbay
{"title":"Non-native (Czech and Russian L1) auditor assessments of some English suprasegmental features: Prominence and pitch accents","authors":"Alexey Tymbay","doi":"10.1016/j.specom.2025.103281","DOIUrl":"10.1016/j.specom.2025.103281","url":null,"abstract":"<div><div>This study reports on a comparative perceptual experiment investigating the ability of Russian and Czech advanced learners of English to identify prominence in spoken English. Two groups of non-native annotators completed prominence marking tasks on English monologues, both before and after undergoing a 12-week phonological training program. The study employed three annotation techniques: Rapid Prosody Transcription (RPT), traditional (British), and ToBI. While the RPT annotations produced by the focus groups did not reach statistical equivalence with those of native English speakers, the data indicate a significant improvement in the perception and categorization of prominence following phonological training. A recurrent difficulty observed in both groups was the accurate identification of prenuclear prominence. This is attributed to prosodic transfer effects from the participants’ first languages, Russian and Czech. The study highlights that systemic, phonetic, and distributional differences in the realization of prominence between L1 and L2 may hinder accurate perceptual judgments in English. It further posits that Russian and Czech speakers rely on different acoustic cues for prominence marking in their native languages, and that these cue-weighting strategies are transferred to English. Nevertheless, the results demonstrate that targeted phonological instruction can substantially enhance L2 learners’ perceptual sensitivity to English prosody.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103281"},"PeriodicalIF":2.4,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparisons of Mandarin on-focus expansion and post-focus compression between native speakers and L2 learners: Production and machine learning classification 本族语使用者与二语学习者普通话焦点前扩展与焦点后压缩的比较:生产与机器学习分类
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-07-09 DOI: 10.1016/j.specom.2025.103280
Jing Wu , Jun Liu , Ting Wang , Sunghye Cho , Yong-cheol Lee
{"title":"Comparisons of Mandarin on-focus expansion and post-focus compression between native speakers and L2 learners: Production and machine learning classification","authors":"Jing Wu ,&nbsp;Jun Liu ,&nbsp;Ting Wang ,&nbsp;Sunghye Cho ,&nbsp;Yong-cheol Lee","doi":"10.1016/j.specom.2025.103280","DOIUrl":"10.1016/j.specom.2025.103280","url":null,"abstract":"<div><div>Korean and Mandarin are reported to have on-focus expansion and post-focus compression in marking prosodic focus. It is not clear whether Korean L2 learners of Mandarin benefit from this prosodic similarity in the production of focused tones or encounter difficulty due to the interaction between tone and intonation in a tonal language. This study examined the prosodic focus of Korean L2 learners of Mandarin through a production experiment, followed by the development of a machine learning classification to automatically detect learners’ production of focused elements. Learners were divided into two groups according to proficiency level (advanced and intermediate) and were directly compared with Mandarin native speakers. Production results showed that intermediate-level speakers did not show any systemic modulations for focus marking. Although the advanced-level speakers performed better than the intermediate group, their prosodic effects of focus were significantly different from those of native speakers in both focus and post-focus positions. The machine learning classification of focused elements reflected clear focus-cueing differences among the three groups. The accuracy rate was about 86 % for the native speakers, 49 % for the advanced learners, and about 34 % for the intermediate learners. The results of this study suggest that on-focus expansion and post-focus compression are not automatically transferred across languages, even when those languages share similar acoustic correlates of prosodic focus. This study also underscores that the difficulty in acquiring the prosodic structure of a tone language lies mainly in mastering tone acquisition, which impacts non-tonal language learners, leading to ineffective performance of on-focus expansion and post-focus compression.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103280"},"PeriodicalIF":2.4,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight online punctuation and capitalization restoration for streaming ASR systems 轻量级在线标点和大写恢复流ASR系统
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-07-05 DOI: 10.1016/j.specom.2025.103269
Martin Polacek, Petr Cerva, Jindrich Zdansky
{"title":"Lightweight online punctuation and capitalization restoration for streaming ASR systems","authors":"Martin Polacek,&nbsp;Petr Cerva,&nbsp;Jindrich Zdansky","doi":"10.1016/j.specom.2025.103269","DOIUrl":"10.1016/j.specom.2025.103269","url":null,"abstract":"<div><div>This work proposes a lightweight online approach to automatic punctuation and capitalization restoration (APCR). Our method takes pure text as input and can be utilized in real-time speech transcription systems for, e.g., live captioning of TV or radio streams. We develop and evaluate it in a series of consecutive experiments, starting with the task of automatic punctuation restoration (APR). Within that, we also compare our results to another real-time APR method, which combines textual and acoustic features. The test data that we use for this purpose contains automatic transcripts of radio talks and TV debates. In the second part of the paper, we extend our method towards the task of automatic capitalization restoration (ACR). The resulting approach uses two consecutive ELECTRA-small models complemented by simple classification heads; the first ELECTRA model restores punctuation, while the second performs capitalization. Our complete system allows for restoring question marks, commas, periods, and capitalization with a very short inference time and a low latency of just four words. We evaluate its performance for Czech and German, and also compare its results to those of another existing APCR system for English. We are also publishing the data used for our evaluation and testing.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103269"},"PeriodicalIF":2.4,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the nuances of reduction in conversational speech: lexicalized and non-lexicalized reductions 探讨会话言语中略读的细微差别:词汇化和非词汇化略读
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-06-25 DOI: 10.1016/j.specom.2025.103268
Kübra Bodur , Corinne Fredouille , Stéphane Rauzy , Christine Meunier
{"title":"Exploring the nuances of reduction in conversational speech: lexicalized and non-lexicalized reductions","authors":"Kübra Bodur ,&nbsp;Corinne Fredouille ,&nbsp;Stéphane Rauzy ,&nbsp;Christine Meunier","doi":"10.1016/j.specom.2025.103268","DOIUrl":"10.1016/j.specom.2025.103268","url":null,"abstract":"<div><div>In spoken language, a significant proportion of words are produced with missing or underspecified segments, a phenomenon known as reduction. In this study, we distinguish two types of reductions in spontaneous speech: <em>lexicalized</em> reductions, which are well-documented, regularly occurring forms driven primarily by lexical processes, and <em>non-lexicalized</em> reductions, which occur irregularly and lack consistent patterns or representations. The latter are inherently more difficult to detect, and existing methods struggle to capture their full range.</div><div>We introduce a novel bottom-up approach for detecting potential reductions in French conversational speech, complemented by a top-down method focused on detecting previously known reduced forms. Our bottom-up method targets sequences consisting of at least six phonemes produced within a 230 ms window, identifying temporally condensed segments, indicative of reduction.</div><div>Our findings reveal significant variability in reduction patterns across the corpus. Lexicalized reductions displayed relatively stable and consistent ratios, whereas non-lexicalized reductions varied substantially and were strongly influenced by speaker characteristics. Notably, gender had a significant effect on non-lexicalized reductions, with male speakers showing higher reduction ratios, while no such effect was observed for lexicalized reductions. The two reduction types were influenced differently by speaking time and articulation rate. A positive correlation between lexicalized and non-lexicalized reduction ratios suggested speaker-specific tendencies.</div><div>Non-lexicalized reductions showed a higher prevalence of certain phonemes and word categories, whereas lexicalized reductions were more closely linked to morpho-syntactic roles. In a focused investigation of selected lexicalized items, we found that “tu sais” was more frequently reduced when functioning as a discourse marker than when used as a pronoun + verb construction. These results support the interpretation that lexicalized reductions are integrated into the mental lexicon, while non-lexicalized reductions are more context-dependent, further supporting the distinction between the two types of reductions.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103268"},"PeriodicalIF":2.4,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prosodic modulation of discourse markers: A cross-linguistic analysis of conversational dynamics 话语标记的韵律调节:会话动态的跨语言分析
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-06-21 DOI: 10.1016/j.specom.2025.103271
Yi Shan
{"title":"Prosodic modulation of discourse markers: A cross-linguistic analysis of conversational dynamics","authors":"Yi Shan","doi":"10.1016/j.specom.2025.103271","DOIUrl":"10.1016/j.specom.2025.103271","url":null,"abstract":"<div><div>This paper delves into the fascinating world of prosody and pragmatics in discourse markers (DMs). We have come a long way since the early structural approaches, and now we are exploring dynamic models that reveal how prosody shapes DM interpretation in spoken discourse. Our journey takes us through various research methods, from acoustic analysis to naturalistic observations, each offering unique insights into how intonation, stress, and rhythm interact with DMs to guide conversations. Recent cross-linguistic studies, such as Ahn et al. (2024) on Korean “<em>nay mali</em>” and Wang et al. (2024) on Mandarin “<em>haole</em>,” demonstrate how prosodic detachment and contextual cues facilitate the evolution of DMs from lexical to pragmatic functions, underscoring the interplay between prosody and discourse management. Further cross-linguistic evidence comes from Vercher’s (2023) analysis of Spanish “<em>entonces</em>” and Siebold’s (2021) study on German “<em>dann</em>,” which highlight language-specific prosodic realizations of DMs in turn management and conversational closings. We are also looking at cross-linguistic patterns to uncover both universal trends and language-specific characteristics. It is amazing how cultural context plays such a crucial role in prosodic analysis. Besides, machine learning and AI are revolutionizing the field, allowing us to analyze prosodic features in massive datasets with unprecedented precision. We are now embracing multimodal analysis by combining prosody with non-verbal cues for a more holistic understanding of DMs in face-to-face communication. These findings have real-world applications, from improving speech recognition to enhancing language teaching methods. Looking ahead, we are advocating for an integrated approach that considers the dynamic interplay between prosody, pragmatics, and social context. There is still so much to explore across linguistic boundaries and diverse communicative settings. This review is not just a state-of-the-art overview. Rather, it is a roadmap for future research in this exciting field.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103271"},"PeriodicalIF":2.4,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144518752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic speech recognition technology to evaluate an audiometric word recognition test: A preliminary investigation 自动语音识别技术评价听测词识别测试:初步研究
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-06-20 DOI: 10.1016/j.specom.2025.103270
Ayden M. Cauchi , Jaina Negandhi , Sharon L. Cushing , Karen A. Gordon
{"title":"Automatic speech recognition technology to evaluate an audiometric word recognition test: A preliminary investigation","authors":"Ayden M. Cauchi ,&nbsp;Jaina Negandhi ,&nbsp;Sharon L. Cushing ,&nbsp;Karen A. Gordon","doi":"10.1016/j.specom.2025.103270","DOIUrl":"10.1016/j.specom.2025.103270","url":null,"abstract":"<div><div>This study investigated the ability of machine learning systems to score a clinical speech perception test in which monosyllabic words are heard and repeated by a listener. The accuracy score is used in audiometric assessments, including cochlear implant candidacy and monitoring. Scoring is performed by clinicians who listen and judge responses, which can create inter-rater variability and takes clinical time. A machine learning approach could support this testing by providing increased reliability and time efficiency, particularly in children. This study focused on the Phonetically Balanced Kindergarten (PBK) word list. Spoken responses (<em>n</em>=1200) were recorded from 12 adults with normal hearing. These words were presented to 3 automatic speech recognizers (Whisper large, Whisper medium, Ursa) and 7 humans in 7 conditions: unaltered or, to simulate potential speech errors, altered by first or last consonant deletion or low pass filtering at 1, 2, 4, and 6 kHz (<em>n</em>=6972 altered responses). Responses were scored as the same or different from the unaltered target. Data revealed that automatic speech recognizers (ASRs) correctly classified unaltered words similarly to human evaluators across conditions [mean ± 1 SE: Whisper large = 88.20 % ± 1.52 %; Whisper medium = 81.20 % ± 1.52 %; Ursa = 90.70 % ± 1.52 %; humans = 91.80 % ± 2.16 %], [F(3, 3866.2) = 23.63, <em>p</em>&lt;0.001]. Classifications different from the unaltered target occurred most frequently for the first consonant deletion and 1 kHz filtering conditions. Fleiss Kappa metrics showed that ASRs displayed higher agreement than human evaluators across unaltered (ASRs = 0.69; humans = 0.17) and altered (ASRs = 0.56; humans = 0.51) PBK words. These results support the further development of automatic speech recognition systems to support speech perception testing.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103270"},"PeriodicalIF":2.4,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144510758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech stimulus continuum synthesis using deep learning methods 基于深度学习方法的语音刺激连续统合成
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-06-17 DOI: 10.1016/j.specom.2025.103266
Zhu Li, Yuqing Zhang, Yanlu Xie
{"title":"Speech stimulus continuum synthesis using deep learning methods","authors":"Zhu Li,&nbsp;Yuqing Zhang,&nbsp;Yanlu Xie","doi":"10.1016/j.specom.2025.103266","DOIUrl":"10.1016/j.specom.2025.103266","url":null,"abstract":"<div><div>Creating a naturalistic speech stimulus continuum (i.e., a series of stimuli equally spaced along a specific acoustic dimension between two given categories) is an indispensable component in categorical perception studies. A common method is to manually modify the key acoustic parameter of speech sounds, yet the quality of synthetic speech is still unsatisfying. This work explores how to use deep learning techniques for speech stimulus continuum synthesis, with the aim of improving the naturalness of the synthesized continuum. Drawing on recent advances in speech disentanglement learning, we implement a supervised disentanglement framework based on adversarial training (AT) to separate the specific acoustic feature (e.g., fundamental frequency, formant features) from other contents in speech signals and achieve controllable speech stimulus generation by sampling from the latent space of the key acoustic feature. In addition, drawing on the idea of mutual information (MI) in information theory, we design an unsupervised MI-based disentanglement framework to disentangle the specific acoustic feature from other contents in speech signals. Experiments on stimulus generation of several continua validate the effectiveness of our proposed method in both objective and subjective evaluations.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103266"},"PeriodicalIF":2.4,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144321733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The perception of intonational peaks and valleys: The effects of plateaux, declination and experimental task 语调峰谷感知:高原、偏角和实验任务的影响
IF 2.4 3区 计算机科学
Speech Communication Pub Date : 2025-06-10 DOI: 10.1016/j.specom.2025.103267
Hae-Sung Jeon
{"title":"The perception of intonational peaks and valleys: The effects of plateaux, declination and experimental task","authors":"Hae-Sung Jeon","doi":"10.1016/j.specom.2025.103267","DOIUrl":"10.1016/j.specom.2025.103267","url":null,"abstract":"<div><div>An experiment assessed listeners’ judgement of either relative pitch height or prominence between two consecutive fundamental frequency (<em>f<sub>o</sub></em>) peaks or valleys in speech. The <em>f<sub>o</sub></em> contour of the first peak or valley was kept constant, while the second was orthogonally manipulated in its height and plateau duration. Half of the stimuli had a flat baseline from which the peaks and valleys were scaled, while the other half had an overtly declining baseline. The results replicated the previous finding that <em>f<sub>o</sub></em> peaks with a long plateau are salient to listeners, while valleys are hard to process even with a plateau. Furthermore, the effect of declination was dependent on the experimental task. Listeners’ responses seemed to be directly affected by the <em>f<sub>o</sub></em> excursion size only for judging relative height between two peaks, while their prominence judgement was strongly affected by the overall impression of the pitch raising or lowering event near the perceptual target. The findings suggest that the global <em>f<sub>o</sub></em> contour, not a single representative <em>f<sub>o</sub></em> value of an intonational event, should be considered in perceptual models of intonation. The findings show an interplay between the signal, listeners’ top-down expectations, and speech perception.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103267"},"PeriodicalIF":2.4,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信