Individuality-Preserving Voice Reconstruction for Articulation Disorders Using Text-to-Speech Synthesis

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI:10.1145/2818346.2820770

Reina Ueda, T. Takiguchi, Y. Ariki

{"title":"Individuality-Preserving Voice Reconstruction for Articulation Disorders Using Text-to-Speech Synthesis","authors":"Reina Ueda, T. Takiguchi, Y. Ariki","doi":"10.1145/2818346.2820770","DOIUrl":null,"url":null,"abstract":"This paper presents a speech synthesis method for people with articulation disorders. Because the movements of such speakers are limited by their athetoid symptoms, their prosody is often unstable and their speech rate differs from that of a physically unimpaired person, which causes their speech to be less intelligible and, consequently, makes communication with physically unimpaired persons difficult. In order to deal with these problems, this paper describes a Hidden Markov Model(HMM)-based text-to-speech synthesis approach that preserves the individuality of a person with an articulation disorder and aids them in their communication. In our method, a duration model of a physically unimpaired person is used for the HMM synthesis system and an F0 model in the system is trained using the F0 patterns of the physically unimpaired person, with the average F0 being converted to the target F0 in advance. In order to preserve the target speaker's individuality, a spectral model is built from target spectra. Through experimental evaluations, we have confirmed that the proposed method successfully synthesizes intelligible speech while maintaining the target speaker's individuality.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2818346.2820770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This paper presents a speech synthesis method for people with articulation disorders. Because the movements of such speakers are limited by their athetoid symptoms, their prosody is often unstable and their speech rate differs from that of a physically unimpaired person, which causes their speech to be less intelligible and, consequently, makes communication with physically unimpaired persons difficult. In order to deal with these problems, this paper describes a Hidden Markov Model(HMM)-based text-to-speech synthesis approach that preserves the individuality of a person with an articulation disorder and aids them in their communication. In our method, a duration model of a physically unimpaired person is used for the HMM synthesis system and an F0 model in the system is trained using the F0 patterns of the physically unimpaired person, with the average F0 being converted to the target F0 in advance. In order to preserve the target speaker's individuality, a spectral model is built from target spectra. Through experimental evaluations, we have confirmed that the proposed method successfully synthesizes intelligible speech while maintaining the target speaker's individuality.

查看原文本刊更多论文

基于文本-语音合成的保留个性的发音障碍重建

本文提出了一种针对发音障碍患者的语音合成方法。由于这些说话者的运动受到他们的动脉状突症状的限制，他们的韵律往往不稳定，他们的说话速度与身体健全的人不同，这导致他们的讲话不太容易理解，因此，与身体健全的人交流很困难。为了解决这些问题，本文描述了一种基于隐马尔可夫模型(HMM)的文本到语音合成方法，该方法保留了发音障碍患者的个性并帮助他们进行交流。在我们的方法中，HMM综合系统使用一个身体健全者的持续时间模型，并使用身体健全者的F0模式训练系统中的F0模型，并将平均F0提前转换为目标F0。为了保持目标说话人的个性，根据目标说话人的谱建立了一个谱模型。实验结果表明，该方法在保持目标说话人个性的同时，成功地合成了可理解语音。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

自引率

0.00%

发文量