PhoneticaPub Date : 2026-05-07DOI: 10.1515/phon-2025-0037
Jose A Mompean
{"title":"From 'preferred' to 'peripheral'? A century-long study of linking-r in Standard Southern British English (1920s-2020s).","authors":"Jose A Mompean","doi":"10.1515/phon-2025-0037","DOIUrl":"https://doi.org/10.1515/phon-2025-0037","url":null,"abstract":"<p><p>This paper presents a diachronic trend study of linking-r in non-rhotic Standard Southern British English (SSBE) as a sound change in progress, examining its evolution over an entire 104-year span from the 1920s to the 2020s. The study investigates the changing role of linking-r and glottalization as strategies for resolving hiatus, reflecting broader phonological and sociolinguistic shifts in SSBE. Utilizing a corpus of recorded speech comprising 4,180 potential cases of linking-r (and a smaller set of potential intrusive-r cases), the analysis includes a balanced representation across different decades and gender groups, involving 312 distinct speakers. Acoustic and auditory analyzes were conducted using PRAAT to determine whether linking-r, glottalization, or hiatus were used in each potential case. The findings indicate a significant decline in the use of linking-r over time, with glottalization emerging as the predominant hiatus-resolution strategy. The findings have implications for our broader understanding of the hiatus resolution system in SSBE and other varieties of English, the broader phonological shift towards the use of glottalization in many accents of English, and the loss of rhoticity in those varieties.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2026-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2026-05-06DOI: 10.1515/phon-2025-0039
Matthew C Kelley
{"title":"Gradient boundaries through confidence intervals for forced alignment estimates using model ensembles.","authors":"Matthew C Kelley","doi":"10.1515/phon-2025-0039","DOIUrl":"https://doi.org/10.1515/phon-2025-0039","url":null,"abstract":"<p><p>Forced alignment is a common tool to align audio with orthographic and phonetic transcriptions. Most forced alignment tools provide only point-estimates of boundaries. The present project introduces a method of producing gradient boundaries by deriving confidence intervals using neural network ensembles. Ten different segment classifier neural networks were previously trained, and the alignment process is repeated with each classifier. The ensemble is then used to place the point-estimate of a boundary at the median of the boundaries in the ensemble, and the gradient range is placed using a 97.85 % confidence interval around the median constructed using order statistics. Gradient boundaries are taken here as a more realistic representation of how segments transition into each other. Moreover, the range indicates the model uncertainty in the boundary placement, facilitating tasks like finding boundaries that should be reviewed. As a bonus, on the Buckeye and TIMIT corpora, the ensemble boundaries show a slight overall improvement over using just a single model. The gradient boundaries can be emitted during alignment as JSON files and a main table for programmatic and statistical analysis. For familiarity, they are also output as Praat TextGrids using a point tier to represent the edges of the boundary regions.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2026-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2026-04-27DOI: 10.1515/phon-2025-0052
Qian Zhang, Jason A Shaw
{"title":"Articulatory adaptations to the effect of nasality on vowel formants in Xinfeng Tieshikou Hakka.","authors":"Qian Zhang, Jason A Shaw","doi":"10.1515/phon-2025-0052","DOIUrl":"https://doi.org/10.1515/phon-2025-0052","url":null,"abstract":"<p><p>Articulatory compensation for mismatches between expected and perceived formants appears to be a fundamental component of speech production. We pursue the hypothesis that this component of speech production accounts for observed differences in articulation between oral and nasal vowel pairs in natural language production. The empirical focus is Xinfeng Tieshikou Hakka (XTH). We report acoustic analyses of three oral-nasal vowel pairs, [i]-[ĩ], [e]-[ẽ], [a]-[ã], from eight speakers as well as electromagnetic articulography of two oral-nasal vowel pairs, [e]-[ẽ], [a]-[ã], from one of these speakers. Building on past work, which has successfully isolated the independent influence of velopharyngeal (VP) coupling on vowel formants, we evaluate whether there is articulatory adaptation that conspires to maintain similar formants across oral and nasal vowel pairs. Acoustic results reveal significant compensatory articulation in XTH [ẽ], whereas [ĩ] exhibits gender differences, with only female speakers adopting a compensatory strategy (tongue retraction). Articulatory findings further confirm the compensatory lowering of tongue position in [ẽ] and additionally demonstrate incomplete compensation for [ã]. Across vowels, articulatory adaptation is found primarily in the vertical dimension, operating to offset the acoustic influence of VP coupling, bringing F1 in nasal vowels closer to their oral counterparts.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2026-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147790263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2026-04-22DOI: 10.1515/phon-2025-0061
Yuanyuan Zhang, Thomas De Valck, Odette Scharenborg
{"title":"Speech recognition performance disparities between Dutch diverse speaker groups.","authors":"Yuanyuan Zhang, Thomas De Valck, Odette Scharenborg","doi":"10.1515/phon-2025-0061","DOIUrl":"https://doi.org/10.1515/phon-2025-0061","url":null,"abstract":"<p><p>Current state-of-the-art automatic speech recognition (ASR) systems recognize typical speech (very) well. However, recent research has shown that their performance degrades for \"diverse\" speech, i.e., speech that diverges from \"typical\" speech due to, among others, demographic and sociolinguistic factors. In this work, given the rapid development of ASR technologies, we examined the performance of nine recently released ASR systems developed by Google, Microsoft, Meta, NVIDIA, and OpenAI, and three custom ASR models trained from scratch, on Dutch diverse speech. Our results showed that although overall recognition results differ quite substantially between the different systems, all systems show similar patterns regarding recognition performance for diverse speaker groups: for most ASR systems and models, language proficiency differences and severe speech motor impairment had a greater impact on performance disparities between speaker groups than demographic or sociolinguistic factors, indicating that acoustic variability due to demographic and sociolinguistic factors is well-represented in \"typical speech\" training data and consequently is well-modeled in the models. Furthermore, we found that differences in data processing pipelines and decoding setups significantly influenced recognition performance. Importantly, updates to company-developed ASR systems do not always improve performance of or reduce performance disparities between diverse speaker groups.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2026-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147790318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2026-02-20DOI: 10.1515/phon-2025-0069
Mohamed Afkir, Georgia Zellou
{"title":"Phonetic and phonological enhancement strategies in Tarifit robot-directed speech.","authors":"Mohamed Afkir, Georgia Zellou","doi":"10.1515/phon-2025-0069","DOIUrl":"10.1515/phon-2025-0069","url":null,"abstract":"<p><p>This study examines phonological and phonetic adaptations in Robot-directed speech (Robot-DS) in Tarifit, an Amazigh language of Morocco. Thirty native speakers (younger and older adults) produced CCəC verbs in two contexts: baseline reading and interaction with a robot. Analyses focused on three features: (1) vowelless word realizations (schwa deletion), (2) schwa epenthesis in onset clusters, and (3) schwa duration. Results reveal categorical and gradient hyperarticulation in Robot-DS: vowelless forms are never produced; epenthetic schwas occurred more frequently, reducing consonant clusters; and prosodic template schwas were significantly lengthened. Age modulated durational, but not categorical, patterns: older adults produced greater lengthening in initial and confirm productions, suggesting pre-emptive hyperarticulation, while younger adults reserved maximal durational enhancement for error-repair contexts. These findings indicate that Tarifit speakers model robots as low-competence interlocutors. The results inform typological accounts of schwa variation and provide practical implications for Amazigh ASR and voice interface design.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146260290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2025-12-19Print Date: 2025-12-17DOI: 10.1515/phon-2024-0034
Marzena Żygis, Sarah Wesolek, Nina Hosseini-Kivanani, Manfred Krifka
{"title":"The prosody of cheering in sports events: the case of long-distance running.","authors":"Marzena Żygis, Sarah Wesolek, Nina Hosseini-Kivanani, Manfred Krifka","doi":"10.1515/phon-2024-0034","DOIUrl":"10.1515/phon-2024-0034","url":null,"abstract":"<p><p>Since cheering has not yet been systematically investigated from a phonetic perspective, this study explores its acoustic characteristics by focusing on a specific type of cheering: inciting calls directed at individual runners during long-distance races, using their names. We investigate its prosodic realization through an experimental approach. We present findings from a production study with recordings in the lab comparing cheering utterances to neutral speech. 30 native speakers of German were asked to cheer on an individual marathon runner in a sporting event shown in a video by calling out his or her name (1-5 syllables). For reasons of comparison, the participants also produced the same names in isolation and carrier sentences. Our results reveal four different cheering patterns: (i) separately produced items of similar duration, (ii) division of items into syllables, (iii) mixed pattern of (i) and (ii), and finally (iv) a singing pattern, again mixed with (i) and (ii). When cheering for the marathon runners, participants used a higher fundamental frequency, a wider F0 range, longer item duration, slower speech rates, and increased intensity.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":"489-524"},"PeriodicalIF":1.1,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12743243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145783716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2025-11-24Print Date: 2025-12-17DOI: 10.1515/phon-2025-0003
Fenqi Wang
{"title":"Modeling the acoustic profiles of vocal emotions in American English and Mandarin Chinese.","authors":"Fenqi Wang","doi":"10.1515/phon-2025-0003","DOIUrl":"10.1515/phon-2025-0003","url":null,"abstract":"<p><p>This study examined the acoustic profiles of five basic emotions in American English and Mandarin Chinese using a big data approach. A total of 6,373 features were extracted using the openSMILE toolkit, and key discriminative features were identified through random forest classification. In American English, vocal emotions were primarily conveyed through pitch-related features, while Mandarin Chinese, shaped by its tonal constraints, relied more on spectral and voice quality cues, including MFCCs, HNR, and shimmer. Linear mixed-effects models confirmed significant effects of emotion on the top-ranked features, and Cohen's <i>d</i> further supported distinct acoustic profiles for each emotion. K-means clustering revealed both categorical groupings and dimensional overlaps, such as the clustering of high-arousal emotions like happy and surprised, and low-arousal emotions like sad and neutral. These results suggest that vocal emotion expression is shaped by language-specific prosodic systems, as well as by both discrete emotion categories and continuous affective dimensions, supporting an integrated model of emotional prosody.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":"445-487"},"PeriodicalIF":1.1,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2025-11-10Print Date: 2025-12-17DOI: 10.1515/phon-2025-0025
Kimiko Tsukada, Đích Mục Đào, Trang Thi Huyen Le
{"title":"Cross-language perception of the Japanese singleton/geminate contrasts: comparison of Vietnamese speakers with and without Japanese language experience.","authors":"Kimiko Tsukada, Đích Mục Đào, Trang Thi Huyen Le","doi":"10.1515/phon-2025-0025","DOIUrl":"10.1515/phon-2025-0025","url":null,"abstract":"<p><p>We examined the perception of Japanese consonant length by three groups of Vietnamese speakers and a group of 10 Japanese speakers. Two of the Vietnamese groups consisted of learners of Japanese with one group participating in Vietnam (<i>n</i> = 17) and the other in Japan (<i>n</i> = 13). The third Vietnamese group consisted of 12 participants inexperienced in Japanese. Unlike Japanese, consonant length is non-contrastive in Vietnamese. Thus, we were interested in how different experience with Japanese may influence the perception of difficult Japanese contrasts. The overall mean discriminability in <i>d</i>-prime was 1.0, 1.9, 3.1 and 4.5 for the non-learner group, the learner group in Vietnam, the learner group in Japan and the native Japanese group, respectively. A clear difference between the two learner groups demonstrates learnability of Japanese consonant length for grownups. At the same time, the qualitative difference between the advanced learners and native Japanese speakers suggests genuine and persistent difficulty of Japanese consonant length. By providing additional empirical data beyond the segmental level, this study helps us to better evaluate the extent to which current theories of second language (L2) speech learning account for the acquisition of a wide range of L2 sounds by speakers from diverse first language backgrounds.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":"391-416"},"PeriodicalIF":1.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2025-11-05Print Date: 2025-12-17DOI: 10.1515/phon-2025-0022
Huichao Bi, Rong Yan, Samad Zare
{"title":"The association between phonological awareness and connected speech perception: an experimental study on young Chinese EFL learners from cue processing perspective.","authors":"Huichao Bi, Rong Yan, Samad Zare","doi":"10.1515/phon-2025-0022","DOIUrl":"10.1515/phon-2025-0022","url":null,"abstract":"<p><p>Connected speech, characterized by phonological variations such as contractions and elisions, poses unique challenges for second language learners, yet research on its perception in young EFL populations remains limited. This study examined English connected speech perception in 72 Chinese EFL children with varying phonological awareness (PA) levels through systematic manipulation of familiarity and salience of acoustic - phonetic and semantic cues. Results demonstrated concurrent activation of both cues, challenging the abstractionist model. Additionally, high PA levels correlated with superior perceptual accuracy and greater cue-weighting flexibility, albeit no significant difference was observed between high and low PA groups under conditions of low cue familiarity and salience. These findings suggest that PA is necessary but insufficient for connected speech perception. Instead, strategic cue weighting plays a vital role, highlighting that EFL instruction should develop young learners' ability to flexibly utilize multiple cues.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":"417-443"},"PeriodicalIF":1.1,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PhoneticaPub Date : 2025-10-28Print Date: 2025-10-27DOI: 10.1515/phon-2025-0001
Chenzi Xu
{"title":"Plastic Mandarin tones: regional identity in prosody.","authors":"Chenzi Xu","doi":"10.1515/phon-2025-0001","DOIUrl":"10.1515/phon-2025-0001","url":null,"abstract":"<p><p>Local differentiation and innovation in spoken Mandarin are now ubiquitous in metropolises in China, accelerated by the widespread promotion of the national lingua franca Standard Mandarin. This paper examines the lexical tones of Plastic Mandarin, a newly crystallised urban Mandarin dialect in Changsha, where the dominant regional variety has been Changsha Xiang. This paper establishes the lexical tone system for Plastic Mandarin using Growth Curve Analysis and explores its development by comparing its tones with those of Changsha produced by the same group of multilingual speakers, employing Generalised Additive Mixed Modelling. The findings show that, while Plastic Mandarin shares the same tone categories as Standard Mandarin, it features distinctive f0 patterns, some of which closely resemble their corresponding Changsha tones. These f0 patterns, imbued with regional identity, are likely motivated by phonetic variation, systemic constraints, social biases, and language contact. The case of Plastic Mandarin may exemplify contact-induced tonal contour changes in forming new Mandarin varieties.</p>","PeriodicalId":55608,"journal":{"name":"Phonetica","volume":" ","pages":"331-362"},"PeriodicalIF":1.1,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}