基于TTS合成语音的跨域说话人识别

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2020-11-24 DOI:10.1109/SLT48900.2021.9383525

Yiling Huang, Yutian Chen, Jason W. Pelecanos, Quan Wang

{"title":"基于TTS合成语音的跨域说话人识别","authors":"Yiling Huang, Yutian Chen, Jason W. Pelecanos, Quan Wang","doi":"10.1109/SLT48900.2021.9383525","DOIUrl":null,"url":null,"abstract":"In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize speech in support of speaker recognition. In this study we focus the analysis on tasks where a relatively small number of speakers is available for training. We observe on our datasets that TTS synthesized speech improves cross-domain speaker recognition performance and can be combined effectively with multi-style training. Additionally, we explore the effectiveness of different types of text transcripts used for TTS synthesis. Results suggest that matching the textual content of the target domain is a good practice, and if that is not feasible, a transcript with a sufficiently large vocabulary is recommended.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Synth2Aug: Cross-Domain Speaker Recognition with TTS Synthesized Speech\",\"authors\":\"Yiling Huang, Yutian Chen, Jason W. Pelecanos, Quan Wang\",\"doi\":\"10.1109/SLT48900.2021.9383525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize speech in support of speaker recognition. In this study we focus the analysis on tasks where a relatively small number of speakers is available for training. We observe on our datasets that TTS synthesized speech improves cross-domain speaker recognition performance and can be combined effectively with multi-style training. Additionally, we explore the effectiveness of different types of text transcripts used for TTS synthesis. Results suggest that matching the textual content of the target domain is a good practice, and if that is not feasible, a transcript with a sufficiently large vocabulary is recommended.\",\"PeriodicalId\":243211,\"journal\":{\"name\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT48900.2021.9383525\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

近年来，文本到语音(TTS)作为一种数据增强技术被用于语音识别，以帮助弥补训练数据的不足。相应地，我们研究了使用多说话人TTS系统来合成语音以支持说话人识别。在这项研究中，我们将分析重点放在可用于培训的演讲者数量相对较少的任务上。在我们的数据集上，我们观察到TTS合成语音提高了跨域说话人识别性能，并且可以有效地与多风格训练相结合。此外，我们还探讨了用于TTS合成的不同类型文本转录本的有效性。结果表明，匹配目标领域的文本内容是一种很好的做法，如果这是不可行的，建议使用具有足够大词汇量的文本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Synth2Aug: Cross-Domain Speaker Recognition with TTS Synthesized Speech

In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize speech in support of speaker recognition. In this study we focus the analysis on tasks where a relatively small number of speakers is available for training. We observe on our datasets that TTS synthesized speech improves cross-domain speaker recognition performance and can be combined effectively with multi-style training. Additionally, we explore the effectiveness of different types of text transcripts used for TTS synthesis. Results suggest that matching the textual content of the target domain is a good practice, and if that is not feasible, a transcript with a sufficiently large vocabulary is recommended.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量