Kenneth de Jong , Yu-Jung Lin , Yen-Chen Hao , Hanyong Park
{"title":"Mapping to perceptual identification in Mandarin learners of English","authors":"Kenneth de Jong , Yu-Jung Lin , Yen-Chen Hao , Hanyong Park","doi":"10.1016/j.wocn.2025.101411","DOIUrl":"10.1016/j.wocn.2025.101411","url":null,"abstract":"<div><div>This paper examines the relationship between cross-language segmental mapping and second language identification accuracy in Taiwan Mandarin speakers learning English, and compares this relationship with that found in previous, parallel research on Korean learners of English. Mapping and identification data were collected for English anterior plosives and non-sibilant fricatives, by means of two parallel identification experiments. Mapping data came from a 17-alternative identification task with <em>Zhuyin Fuhao</em> labels (phonetic script used to annotate Mandarin sounds in Taiwan), and identification data came from a 15-alternative identification task with Roman labels, both applied to the same stimuli. Mapping data were used to generate predictions about the identification performance by estimating what the performance would be, given the use of only the Mandarin categories. Like the previous Korean data, Mandarin speakers exhibited identification rates for plosives that are very close to predicted, indicating that their plosive identification performance was heavily entangled with their Mandarin system, while fricative identification performance was greatly under-predicted by the mapping data. Further analyses of category differentiation measured with <em>d</em>-prime estimates showed that Mandarin speakers’ manner differentiation performance was very well-predicted by the mapping data, while Korean speakers’ laryngeal differentiation was better predicted. Taken together, these results indicate that the second language identification performance and the cross-language mapping into the first language are closely entangled in a single system. The additional second language component appears in a pervasive increment in performance in the second language beyond what is predicted from the first language system, and in more unaccounted-for variance in laryngeal identification than in manner identification.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101411"},"PeriodicalIF":1.9,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contribution of F0 and phonation to tone perception in the Zaiwa language","authors":"Yao Lu, Changwei Liang, Jiangping Kong","doi":"10.1016/j.wocn.2025.101413","DOIUrl":"10.1016/j.wocn.2025.101413","url":null,"abstract":"<div><div>Previous research on categorical perception of tone has primarily examined the influence of fundamental frequency (F0), while the role of phonation, though increasingly studied, remains underexplored. This study investigates the role of phonation and how it interacts with F0 cues in tone perception, using the Zaiwa language as a case study. Specifically, we examine the categorical perception of Tone 44 (produced with a pressed voice) and Tone 35 (produced with a modal voice). To achieve this, we first conducted an acoustic analysis of the Zaiwa tone system, which forms the basis for our novel method of speech synthesis. Using this method, we created six tonal continua between Tone 44 and Tone 35 by systematically modifying F0 alone, phonation alone, and both simultaneously. Native Zaiwa speakers then participated in an experiment using the categorical perception paradigm with these synthesized continua. The results indicate that the participants were unable to distinguish the phonemic categories of the two tones when only phonation was modified. While modifying F0 alone allowed for tone distinction, participants’ perception followed a continuous pattern. However, when both F0 and phonation were modified simultaneously, participants accurately identified the phonemic categories of tones and perceived the continuum between the two tones categorically. These findings suggest that both F0 and phonation serve as perceptual cues for distinguishing Tone 44 and Tone 35 in Zaiwa, with F0 as the primary cue and phonation as a secondary cue. However, phonation remains crucial, as its absence weakens the categorical perception of these tones.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101413"},"PeriodicalIF":1.9,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shihao Du, Stephan R. Kuberski, Adamantios I. Gafos
{"title":"Corrigendum to “Towards a dynamical account of inter-segmental coordination” [J. Phon. 109 (2025) 101392]","authors":"Shihao Du, Stephan R. Kuberski, Adamantios I. Gafos","doi":"10.1016/j.wocn.2025.101414","DOIUrl":"10.1016/j.wocn.2025.101414","url":null,"abstract":"","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101414"},"PeriodicalIF":1.9,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143838437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual and paradigmatic effects on suspended contrast across generations: The case of Cantonese pinjam revisited","authors":"Alan C.L. Yu , Vivian Guo Li , Peggy P.K. Mok","doi":"10.1016/j.wocn.2025.101412","DOIUrl":"10.1016/j.wocn.2025.101412","url":null,"abstract":"<div><div>Suspended contrast refers to the phenomenon whereby sound change brings two phonemes into such close approximation that semantic contrast between them is suspended for native speakers of the language, without necessarily leading to complete merger or neutralization. The existence of suspended contrasts not only raises questions about the nature of the phonetics-phonology interface, but also for theories of sound change that assume sound change is biased toward selective maintenance of phonemes that contribute more to distinguishing existing lexical items in usage. Small differences supporting a suspended contrast are expected to disappear quickly given that they do not serve any apparent communicative functions. It remains a question whether a contrast can be suspended for a considerable period of time. This study revisits a case of suspended contrast in Cantonese between the lexical high rising tone and the high rising tone derived through morphological tone change (<em>pinjam</em>). We use an apparent-time approach to investigate the diachronic trajectory of this neutralization by comparing the distribution of this suspended contrast along both F0 and durational dimensions across two generations of Hong Kong Cantonese speakers. While this case of suspended tonal contrast has been in circulation for almost a century, our findings suggest that the distinction might be disappearing among the younger speakers. Only older speakers maintain a distinction between the lexical and derived rising tones, albeit in very restricted tonal contexts. The fact that this suspended tonal contrast exhibits great sensitivity to contextual and morphological influences may help explain the progression of this case of merger-in-progress.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101412"},"PeriodicalIF":1.9,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Normalization, essentialization, and the erasure of social and linguistic variation","authors":"Santiago Barreda","doi":"10.1016/j.wocn.2025.101409","DOIUrl":"10.1016/j.wocn.2025.101409","url":null,"abstract":"<div><div>Linguists investigating the phonetic properties of vowels, e.g. height and frontness, often use normalization algorithms to remove ‘irrelevant’ variation from vowel formant data. The current conception and evaluation of these algorithms focuses on phonemic classification and the removal of ‘anatomical’ variation, an approach which suggests an essentialist perspective on linguistic variation and leads to the erasure and underreporting of linguistic and social information. Instead, it is suggested that for many purposes, researchers need algorithms that correctly represent phonetic information by removing only <em>non-phonetic</em> formant variation. Acoustic variation that does not affect phonetic properties is non-phonetic, making it ‘transparent’ to the linguistic system and incapable of communicating linguistic contrast. Evidence is presented that only the uniform scaling of formant patterns appears to be non-phonetic, indicating that uniform scaling normalization algorithms should be preferred. Finally, given that phonetic properties are products of human psychology that enter into experience only through perception, it is argued that the normalization algorithms used by phoneticians and sociolinguists should be thought of as models of human perception. The change to a perceptual and phonetic, rather than anatomical and phonemic, approach to normalization will promote more reliable and theoretically sound research outcomes, and better aligns with linguistic theory.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101409"},"PeriodicalIF":1.9,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Gesture-Field-Register (GFR) framework for modeling F0 control","authors":"Seung-Eun Kim , Sam Tilsen","doi":"10.1016/j.wocn.2025.101410","DOIUrl":"10.1016/j.wocn.2025.101410","url":null,"abstract":"<div><div>In this study, we introduce an F0 modeling framework – which we refer to as the Gesture-Field-Register (GFR) framework – in which F0 production involves joint control of relatively generic intentions and how those intentions are mapped to physical F0 values. Building on Articulatory Phonology (AP) and Task Dynamics (TD), the GFR framework considers F0 gestures to be the fundamental units of F0 control. It further holds (i) that the dynamic target F0 state of a speaker is determined by the blending of F0 gestural targets in a planning field and (ii) that the gestural targets and dynamic targets are represented in normalized values which are converted to F0 in Hz via dynamic control of F0 register. We show how this framework accounts for a variety of empirical F0 patterns, and we present a case study that uses parameter optimization to analyze empirical F0 contours into a time series of gestural activation and register states. In doing so, we demonstrate that the framework allows for gestural targets to be invariant within an utterance, despite the fact that the surface contours are highly variable. Model code and examples for generating and fitting F0 contours are publicly available in Github and OSF repositories. Overall, the GFR framework provides a novel way of conceptualizing and modeling F0 control under AP/TD and further expands the AP/TD by incorporating the mechanisms of a planning field and dynamic register control.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101410"},"PeriodicalIF":1.9,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Processing pronunciation variation with independently mappable allophones","authors":"Rachel Soo, Molly Babel","doi":"10.1016/j.wocn.2025.101402","DOIUrl":"10.1016/j.wocn.2025.101402","url":null,"abstract":"<div><div>Sound change can present synchronic variation with categorical pronunciation variants. This is the case in Cantonese, where syllable-initial /n/ is merging with /l/, occasionally creating homophones (e.g., <em>lou5</em> 腦 “brain”/ 老“old”) and giving rise to [n]- and [l]-initial pronunciation variants that are allophones. This pronunciation variation offers insight into how variation is processed in spoken word recognition because [n] and [l] in Cantonese are not associated with an orthographic standard. Across four experiments, we examine the perception, recognition, and encoding of Cantonese [n] and [l], and use Bayesian analyses where gradient interpretations are more straightforward. We observe perceptual evidence that these allophones are distinguishable (Exp 2). In recognition (Exp 1) and encoding (Exp 3) paradigms, we find that the [n] and [l] allophones are processed neither equivalently nor distinctly when the targets bear the more common [l]-initial allophone. When the targets bear the [n]-initial allophone (Exp 4), we observe high error rates, and somewhat contradictory results. Altogether, the results suggest that [n] and [l] are allophonic variants independently mapped to a phoneme, with connection strengths varying as a function of the frequency, such that the more common [l]-initial pronunciation demonstrates an overall recognition advantage.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"110 ","pages":"Article 101402"},"PeriodicalIF":1.9,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143684746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rosamund Oxbury , Matthew Hunt , Kathleen M. McCarthy
{"title":"The acquisition of Multicultural London English: Child and adolescent diphthong variation in West London","authors":"Rosamund Oxbury , Matthew Hunt , Kathleen M. McCarthy","doi":"10.1016/j.wocn.2024.101388","DOIUrl":"10.1016/j.wocn.2024.101388","url":null,"abstract":"<div><div>This study investigated Multicultural London English (MLE) diphthongs as produced by children and adolescents in the London borough of Ealing, UK. We conducted an acoustic analysis of the diphthongs <span>face</span>, <span>price</span> and <span>goat</span> in the speech of 24 young people aged 16–24 years and, 14 children aged 5–7 years. The results revealed different production patterns between the children and adolescents for some but not all the diphthong variables. We found that the children’s and adolescents’ diphthongs were similar in the quality of the onset, and similar to the MLE system described in East London, in the London borough of Hackney. However, the children had not acquired monophthongization of the diphthongs, with adolescents producing significantly more monophthongal tokens of <span>price,</span> <span>goat</span> and, to a lesser extent, <span>face</span>. These findings have implications both for the study of multiethnolects and MLE, and for research on children’s acquisition of sociophonetic variation.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"109 ","pages":"Article 101388"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What are you sinking about? Experience with unfamiliar accent produces both inhibition and facilitation during lexical processing","authors":"Yevgeniy Vasilyevich Melguy , Keith Johnson","doi":"10.1016/j.wocn.2025.101401","DOIUrl":"10.1016/j.wocn.2025.101401","url":null,"abstract":"<div><div>Speech produced with an unfamiliar accent may pose a challenge for listeners, resulting in delayed processing and/or decreased intelligibility. Such costs may be due to a mismatch between listeners’ experience with how a given sound category is phonetically realized, and how it is implemented by an unfamiliar speaker. Phonetic mismatches can increase processing time, but listeners could avoid them by adjusting their expectations for a given speaker or speech variety. This study investigates how changes in phonetic category structure may facilitate (or inhibit) processing of novel words produced with either the same or a phonetically similar accent, asking whether such adaptation is driven by a <em>targeted shift</em> or <em>expansion</em> of phonetic category boundaries. An artificial accent was created by morphing voiceless fricatives /θ/ and /s/ to create phonetically ambiguous [θ/s], which was presented in disambiguating /θ/ word frames (e.g., <em>hypo[</em>θ<em>/s]etical</em>). To examine the effect of phonetic learning on word processing, listeners were divided into three groups and asked to complete an exposure task where they heard either (1) accented critical /θ/ words, (2) natural (unaccented) /θ/ words, or (3) no /θ/ words. All listeners then completed a cross-modal priming task where, across two experiments, they were tested on their processing of words produced with the same artificial accent or three related accents differing in their phonetic match to the training accent. Overall, results show that while there was no effect of prior exposure on processing of novel words produced with the exposure accent, listeners with prior accent exposure showed a distinct pattern of facilitation and inhibition when processing words produced with the novel accents, compared to listeners with no prior accent exposure. Interestingly, listeners with prior exposure to unaccented /θ/ words tended to pattern with the accented /θ/ exposure group, rather than with controls. The role of acoustic/perceptual similarity and prior experience are discussed, along with implications of these results for a <em>category expansion</em> mechanism of phonetic learning.</div><div>All data, stimuli, and code for this study are freely available on OSF via <span><span>https://osf.io/xw5k3/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"109 ","pages":"Article 101401"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dipping and Falling as competing strategies for maintaining the distinctiveness of the low tone in the four-tone system of Kaifeng Mandarin","authors":"Lei Wang , Marco van de Ven , Carlos Gussenhoven","doi":"10.1016/j.wocn.2025.101391","DOIUrl":"10.1016/j.wocn.2025.101391","url":null,"abstract":"<div><div>Kaifeng Mandarin has four tones (LH, HL, H, L), among which the citation pronunciation of L is an f0 fall (the ‘Falling variant’) for some speakers and a falling-rising f0 contour (the ‘Dipping variant’) for others. Seeking to comprehend the rationale behind this idiosyncratic variation, we decided to investigate the distinctiveness of each variant of L with each of the three other tones, LH, HL and H. Accordingly, we constructed six ten-step f0 continua, using two naturally spoken syllables [ma] spoken by a male and a female speaker as source files. In a two-alternative forced choice task, the Falling and Dipping variants turned out to be equally distinctive. Specifically, the results revealed distinct categorizations between the Dipping variant and HL as well as between the Falling variant and LH. However, when the Dipping variant needed to be distinguished from LH and the Falling variant from HL, recognition accuracy dropped significantly, favoring the complex tone. The two L-variants were equally discriminable from H. This overall functional similarity of the two variants goes some way towards understanding their coexistence within the same speech community. Because communicative intentions played no role in the experiment, it remains to be seen if the distribution across speakers will remain stable in production experiments that vary communicative duress, as created by the need to discriminate between the L-tone and each of the two complex tones.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"109 ","pages":"Article 101391"},"PeriodicalIF":1.9,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143137776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}