美式英语、英式英语听者和 TTS 语音对口型元音和鼻化元音的感知识别

IF 1.5 Q2 COMMUNICATION

Frontiers in Communication Pub Date : 2023-12-11 DOI:10.3389/fcomm.2023.1307547

Jakub Gwizdzinski, Santiago Barreda, Christopher Carignan, Georgia Zellou

{"title":"美式英语、英式英语听者和 TTS 语音对口型元音和鼻化元音的感知识别","authors":"Jakub Gwizdzinski, Santiago Barreda, Christopher Carignan, Georgia Zellou","doi":"10.3389/fcomm.2023.1307547","DOIUrl":null,"url":null,"abstract":"Nasal coarticulation is when the lowering of the velum for a nasal consonant co-occurs with the production of an adjacent vowel, causing the vowel to become (at least partially) nasalized. In the case of anticipatory nasal coarticulation, enhanced coarticulatory magnitude on the vowel facilitates the identification of an upcoming nasal coda consonant. However, nasalization also affects the acoustic properties of the vowel, including formant frequencies. Thus, while anticipatory nasalization may help facilitate perception of a nasal coda consonant, it may at the same time cause difficulty in the correct identification of preceding vowels. Prior work suggests that the temporal degree of nasal coarticulation is greater in American English (US) than British English (UK), yet the perceptual consequences of these differences have not been explored. The current study investigates perceptual confusions for oral and nasalized vowels in US and UK TTS voices by US and UK listeners. We use TTS voices, in particular, to explore these perceptual consequences during human-computer interaction, which is increasing due to the rise of speech-enabled devices. Listeners heard words with oral and nasal codas produced by US and UK voices, masked with noise, and made lexical identifications from a set of options varying in vowel and coda contrasts. We find the strongest effect of speaker dialect on accurate word selection: overall accuracy is highest for UK Oral Coda words (83%) and lower for US Oral Coda words (67%); the lowest accuracy was for words with Nasal Codas in both dialects (UK Nasal = 61%; US Nasal = 60%). Error patterns differed across dialects: both listener groups made more errors in identifying nasal codas in words produced in UK English than those produced in US English. Yet, the rate of errors in identifying the quality of nasalized vowels was similarly lower than that of oral vowels across both varieties. We discuss the implications of these results for cross-dialectal coarticulatory variation, human-computer interaction, and perceptually driven sound change.","PeriodicalId":31739,"journal":{"name":"Frontiers in Communication","volume":"81 3‐4","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perceptual identification of oral and nasalized vowels across American English and British English listeners and TTS voices\",\"authors\":\"Jakub Gwizdzinski, Santiago Barreda, Christopher Carignan, Georgia Zellou\",\"doi\":\"10.3389/fcomm.2023.1307547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nasal coarticulation is when the lowering of the velum for a nasal consonant co-occurs with the production of an adjacent vowel, causing the vowel to become (at least partially) nasalized. In the case of anticipatory nasal coarticulation, enhanced coarticulatory magnitude on the vowel facilitates the identification of an upcoming nasal coda consonant. However, nasalization also affects the acoustic properties of the vowel, including formant frequencies. Thus, while anticipatory nasalization may help facilitate perception of a nasal coda consonant, it may at the same time cause difficulty in the correct identification of preceding vowels. Prior work suggests that the temporal degree of nasal coarticulation is greater in American English (US) than British English (UK), yet the perceptual consequences of these differences have not been explored. The current study investigates perceptual confusions for oral and nasalized vowels in US and UK TTS voices by US and UK listeners. We use TTS voices, in particular, to explore these perceptual consequences during human-computer interaction, which is increasing due to the rise of speech-enabled devices. Listeners heard words with oral and nasal codas produced by US and UK voices, masked with noise, and made lexical identifications from a set of options varying in vowel and coda contrasts. We find the strongest effect of speaker dialect on accurate word selection: overall accuracy is highest for UK Oral Coda words (83%) and lower for US Oral Coda words (67%); the lowest accuracy was for words with Nasal Codas in both dialects (UK Nasal = 61%; US Nasal = 60%). Error patterns differed across dialects: both listener groups made more errors in identifying nasal codas in words produced in UK English than those produced in US English. Yet, the rate of errors in identifying the quality of nasalized vowels was similarly lower than that of oral vowels across both varieties. We discuss the implications of these results for cross-dialectal coarticulatory variation, human-computer interaction, and perceptually driven sound change.\",\"PeriodicalId\":31739,\"journal\":{\"name\":\"Frontiers in Communication\",\"volume\":\"81 3‐4\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fcomm.2023.1307547\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomm.2023.1307547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMMUNICATION","Score":null,"Total":0}

引用次数: 0

摘要

鼻音共鸣是指在发出相邻元音的同时降低鼻辅音的舌尖，导致元音（至少部分）鼻化。在预期鼻音共鸣的情况下，元音上增强的共鸣幅度有助于识别即将出现的鼻尾辅音。然而，鼻化也会影响元音的声学特性，包括声门频率。因此，虽然预期鼻化可能有助于促进对鼻尾辅音的感知，但同时也可能给正确识别前面的元音带来困难。先前的研究表明，美式英语（US）的鼻音共时程度大于英式英语（UK），但这些差异的感知后果尚未得到探讨。本研究调查了美国和英国听者对美国和英国 TTS 语音中口音元音和鼻化元音的感知混淆。我们特别使用 TTS 语音来探讨人机交互过程中的这些感知后果，由于语音设备的兴起，人机交互正在日益增多。听者听到了由美国和英国声音发出的带有口音和鼻音尾音的单词，这些单词被噪声掩盖，听者从一组元音和尾音对比不同的选项中进行词性识别。我们发现说话者的方言对准确选词的影响最大：英国口语尾音单词的总体准确率最高（83%），美国口语尾音单词的准确率较低（67%）；两种方言中鼻音尾音单词的准确率最低（英国鼻音=61%；美国鼻音=60%）。不同方言的错误模式也不同：两组听者在识别英国英语中的鼻音尾音时所犯的错误都比美国英语中的多。然而，在识别鼻化元音的质量方面，两种方言的错误率同样低于口化元音。我们讨论了这些结果对跨方言共发音变异、人机交互和知觉驱动的声音变化的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Perceptual identification of oral and nasalized vowels across American English and British English listeners and TTS voices

Nasal coarticulation is when the lowering of the velum for a nasal consonant co-occurs with the production of an adjacent vowel, causing the vowel to become (at least partially) nasalized. In the case of anticipatory nasal coarticulation, enhanced coarticulatory magnitude on the vowel facilitates the identification of an upcoming nasal coda consonant. However, nasalization also affects the acoustic properties of the vowel, including formant frequencies. Thus, while anticipatory nasalization may help facilitate perception of a nasal coda consonant, it may at the same time cause difficulty in the correct identification of preceding vowels. Prior work suggests that the temporal degree of nasal coarticulation is greater in American English (US) than British English (UK), yet the perceptual consequences of these differences have not been explored. The current study investigates perceptual confusions for oral and nasalized vowels in US and UK TTS voices by US and UK listeners. We use TTS voices, in particular, to explore these perceptual consequences during human-computer interaction, which is increasing due to the rise of speech-enabled devices. Listeners heard words with oral and nasal codas produced by US and UK voices, masked with noise, and made lexical identifications from a set of options varying in vowel and coda contrasts. We find the strongest effect of speaker dialect on accurate word selection: overall accuracy is highest for UK Oral Coda words (83%) and lower for US Oral Coda words (67%); the lowest accuracy was for words with Nasal Codas in both dialects (UK Nasal = 61%; US Nasal = 60%). Error patterns differed across dialects: both listener groups made more errors in identifying nasal codas in words produced in UK English than those produced in US English. Yet, the rate of errors in identifying the quality of nasalized vowels was similarly lower than that of oral vowels across both varieties. We discuss the implications of these results for cross-dialectal coarticulatory variation, human-computer interaction, and perceptually driven sound change.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊