12th ISCA Speech Synthesis Workshop (SSW2023)最新文献_第2页

FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms 从谱图估计基频的一种计算效率高的可微方法

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-34

L. Finkelstein, Chun-an Chan, Vincent Wan, H. Zen, Rob Clark

引用次数: 0

Voice Cloning: Training Speaker Selection with Limited Multi-Speaker Corpus 语音克隆:用有限的多说话人语料库训练说话人选择

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-27

David Guennec, Lily Wadoux, A. Sini, N. Barbot, Damien Lolive

引用次数: 0

Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive Control 局部风格符号:用于TTS表达控制的细粒度韵律表示

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-19

Martin Lenglet, O. Perrotin, G. Bailly

引用次数: 0

The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized Lectures 暂停-内音小品对综合讲座中回忆的影响

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-32

Mikey Elmers, Éva Székely

引用次数: 0

Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests 文本-语音模型的主观评价:用消去测试比较绝对类别评定和排序

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-30

K. Lakshminarayana, C. Dittmar, N. Pia, Emanuël Habets

引用次数: 0

Importance of Human Factors in Text-To-Speech Evaluations 人的因素在文本到语音评价中的重要性

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-5

L. Finkelstein, Joshua Camp, R. Clark

引用次数: 0

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS 情境语音合成:会话式TTS评价中的语境因素研究

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-11

Harm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely

引用次数: 0

Diffusion Transformer for Adaptive Text-to-Speech 自适应文本到语音的扩散变压器

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-25

Haolin Chen, Philip N. Garner

引用次数: 1

Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech Synthesis 用于发音-语音合成的超声舌图像数据增强方法

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-36

I. Ibrahimov, G. Gosztolya, T. Csapó

{"title":"Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech Synthesis","authors":"I. Ibrahimov, G. Gosztolya, T. Csapó","doi":"10.21437/ssw.2023-36","DOIUrl":"https://doi.org/10.21437/ssw.2023-36","url":null,"abstract":"Articulation-to-Speech Synthesis (ATS) focuses on converting articulatory biosignal information into audible speech, nowadays mostly using DNNs, with a future target application of a Silent Speech Interface. Ultrasound Tongue Imaging (UTI) is an affordable and non-invasive technique that has become popular for collecting articulatory data. Data augmentation has been shown to improve the generalization ability of DNNs, e.g. to avoid overfitting, introduce variations into the existing dataset, or make the network more robust against various noise types on the input data. In this paper, we compare six different data augmentation methods on the UltraSuite-TaL corpus during UTI-based ATS using CNNs. Validation mean squared error is used to evaluate the performance of CNNs, while by the synthesized speech samples, the performace of direct ATS is measured us-ing MCD and PESQ scores. Although we did not find large differences in the outcome of various data augmentation techniques, the results of this study suggest that while applying data augmentation techniques on UTI poses some challenges due to the unique nature of the data, it provides benefits in terms of enhancing the robustness of neural networks. In general, articulatory control might be beneficial in TTS as well.","PeriodicalId":346639,"journal":{"name":"12th ISCA Speech Synthesis Workshop (SSW2023)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130703672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-lingual transfer using phonological features for resource-scarce text-to-speech 利用语音特征进行资源稀缺的文本到语音的跨语言迁移

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI: 10.21437/ssw.2023-9

J. A. Louw

引用次数: 0