{"title":"英语和日语合成语音中停止发声对比的语言特征语音实现。","authors":"James Tanner, Yasuaki Shinohara, Faith Chiu","doi":"10.1121/10.0039066","DOIUrl":null,"url":null,"abstract":"<p><p>Speech synthesis has improved dramatically over recent years, enabled by large datasets and advances in neural network architectures. Little is known, however, about how synthesised speech patterns are realized from a phonetic perspective. By synthesising speech in two languages with differing implementations of stop voicing, we observe that synthesised speech broadly follows expected patterns for each language, though partially diverges for specific segments. Synthesising speakers into the opposing language also results in stops similar to target language distributions. These findings demonstrate the capability of speech synthesis models to encode phonetic information and further motivate questions regarding the phonetics of synthesised speech.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 8","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Language-specific phonetic realisation of stop voicing contrasts in English and Japanese synthesised speech.\",\"authors\":\"James Tanner, Yasuaki Shinohara, Faith Chiu\",\"doi\":\"10.1121/10.0039066\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Speech synthesis has improved dramatically over recent years, enabled by large datasets and advances in neural network architectures. Little is known, however, about how synthesised speech patterns are realized from a phonetic perspective. By synthesising speech in two languages with differing implementations of stop voicing, we observe that synthesised speech broadly follows expected patterns for each language, though partially diverges for specific segments. Synthesising speakers into the opposing language also results in stops similar to target language distributions. These findings demonstrate the capability of speech synthesis models to encode phonetic information and further motivate questions regarding the phonetics of synthesised speech.</p>\",\"PeriodicalId\":73538,\"journal\":{\"name\":\"JASA express letters\",\"volume\":\"5 8\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JASA express letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0039066\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JASA express letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/10.0039066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
Language-specific phonetic realisation of stop voicing contrasts in English and Japanese synthesised speech.
Speech synthesis has improved dramatically over recent years, enabled by large datasets and advances in neural network architectures. Little is known, however, about how synthesised speech patterns are realized from a phonetic perspective. By synthesising speech in two languages with differing implementations of stop voicing, we observe that synthesised speech broadly follows expected patterns for each language, though partially diverges for specific segments. Synthesising speakers into the opposing language also results in stops similar to target language distributions. These findings demonstrate the capability of speech synthesis models to encode phonetic information and further motivate questions regarding the phonetics of synthesised speech.