利用 LSTM 神经网络进行跨语言语音分割，并采用迭代修正程序

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Intelligence Pub Date : 2023-09-19 DOI:10.1111/coin.12602

Zdeněk Hanzlíček, Jindřich Matoušek, Jakub Vít

{"title":"利用 LSTM 神经网络进行跨语言语音分割，并采用迭代修正程序","authors":"Zdeněk Hanzlíček, Jindřich Matoušek, Jakub Vít","doi":"10.1111/coin.12602","DOIUrl":null,"url":null,"abstract":"<p>This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 2","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/coin.12602","citationCount":"0","resultStr":"{\"title\":\"Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure\",\"authors\":\"Zdeněk Hanzlíček, Jindřich Matoušek, Jakub Vít\",\"doi\":\"10.1111/coin.12602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.</p>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"40 2\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/coin.12602\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12602\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12602","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了使用长短期记忆递归神经网络进行语音分割的实验。论文的主要部分涉及多语言和跨语言分段，即在不同于训练模型的语言上进行分段。实验数据包括用于语音合成的大型捷克语、英语、德语和俄语语音库。为了优化多语言建模，通过共享和聚类特定语言的音素，提出了一个紧凑的音素字母表。我们进行了许多实验，探索各种实验条件和数据组合。我们提出了一个简单的程序，通过迭代使不准确的默认模型适应新的语音/语言。通过与基于隐马尔可夫模型的框架所创建的参考分段进行比较，并进行额外的人工修正，评估了分段的准确性。在单元选择文本到语音系统中也采用了由此产生的分段。通过偏好听力测试，将生成的语音质量与参考分段进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure

查看原文本刊更多论文

Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure

This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.