Synth2Aug: Cross-Domain Speaker Recognition with TTS Synthesized Speech

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2020-11-24 DOI:10.1109/SLT48900.2021.9383525

Yiling Huang, Yutian Chen, Jason W. Pelecanos, Quan Wang

引用次数: 9

Abstract

In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize speech in support of speaker recognition. In this study we focus the analysis on tasks where a relatively small number of speakers is available for training. We observe on our datasets that TTS synthesized speech improves cross-domain speaker recognition performance and can be combined effectively with multi-style training. Additionally, we explore the effectiveness of different types of text transcripts used for TTS synthesis. Results suggest that matching the textual content of the target domain is a good practice, and if that is not feasible, a transcript with a sufficiently large vocabulary is recommended.

查看原文本刊更多论文

基于TTS合成语音的跨域说话人识别

近年来，文本到语音(TTS)作为一种数据增强技术被用于语音识别，以帮助弥补训练数据的不足。相应地，我们研究了使用多说话人TTS系统来合成语音以支持说话人识别。在这项研究中，我们将分析重点放在可用于培训的演讲者数量相对较少的任务上。在我们的数据集上，我们观察到TTS合成语音提高了跨域说话人识别性能，并且可以有效地与多风格训练相结合。此外，我们还探讨了用于TTS合成的不同类型文本转录本的有效性。结果表明，匹配目标领域的文本内容是一种很好的做法，如果这是不可行的，建议使用具有足够大词汇量的文本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量