设计一个用于开放域英语语音合成的大型录音脚本*

Phonetics and Speech Sciences Pub Date : 2021-09-01 DOI:10.13064/ksss.2021.13.3.065

Sunhee Kim, Hojeong Kim, Yooseop Lee, Boryoung Kim, Yongkook Won, Bongwan Kim

{"title":"设计一个用于开放域英语语音合成的大型录音脚本*","authors":"Sunhee Kim, Hojeong Kim, Yooseop Lee, Boryoung Kim, Yongkook Won, Bongwan Kim","doi":"10.13064/ksss.2021.13.3.065","DOIUrl":null,"url":null,"abstract":"This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"16 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing a large recording script for open-domain English speech\\n synthesis*\",\"authors\":\"Sunhee Kim, Hojeong Kim, Yooseop Lee, Boryoung Kim, Yongkook Won, Bongwan Kim\",\"doi\":\"10.13064/ksss.2021.13.3.065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.\",\"PeriodicalId\":255285,\"journal\":{\"name\":\"Phonetics and Speech Sciences\",\"volume\":\"16 8\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Phonetics and Speech Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13064/ksss.2021.13.3.065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phonetics and Speech Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13064/ksss.2021.13.3.065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

提出了一种用于开放域英语语音合成的大型录音脚本的设计方法。对于朗读风格的文本，使用五种不同新闻媒体出版物中的文本设计了12个域和294个子域。对于会话式文本，使用电影字幕设计了4个域和36个子域。最终的脚本由43,013个句子、27,085个朗读式句子和15,928个会话式句子组成，由549,683个标记和38,356个类型组成。完成的脚本使用四个标准进行分析:单词覆盖率(类型覆盖率和标记覆盖率)、高频词汇覆盖率、语音覆盖率(双phone覆盖率和三phone覆盖率)和可读性。我们脚本的类型覆盖率达到36.86%，尽管它的令牌覆盖率很低，只有2.97%。脚本高频词汇覆盖率为73.82%，全脚本双声部覆盖率为86.70%，三声部覆盖率为38.92%。整句的平均可读性为9.03。分析结果表明，该方法可以有效地生成用于英语语音合成的大型录音脚本，在独特单词、高频词汇、语音单位和可读性方面具有良好的覆盖率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Designing a large recording script for open-domain English speech synthesis*

This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Phonetics and Speech Sciences

自引率

0.00%

发文量