Speech Synthesis Workshop最新文献_第2页

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis 基于递归神经网络隐藏状态的上下文表示用于统计参数语音合成

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-28

Sivanand Achanta, Rambabu Banoth, Ayushi Pandey, Anandaswarup Vadapalli, S. Gangashetty

引用次数: 1

WikiSpeech - enabling open source text-to-speech for Wikipedia WikiSpeech -为维基百科启用开源文本到语音

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-16

J. Andersson, S. Berlin, André Costa, Harald Berthelsen, Hanna Lindgren, N. Lindberg, J. Beskow, Jens Edlund, Joakim Gustafson

引用次数: 4

Open-Source Consumer-Grade Indic Text To Speech 开源消费级索引文本到语音

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-31

Andrew Wilkinson, A. Parlikar, Sunayana Sitaram, Tim White, A. Black, Suresh Bazaj

引用次数: 10

Prosodic and Spectral iVectors for Expressive Speech Synthesis 表达性语音合成的韵律和谱向量

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-10

Igor Jauk, A. Bonafonte

{"title":"Prosodic and Spectral iVectors for Expressive Speech Synthesis","authors":"Igor Jauk, A. Bonafonte","doi":"10.21437/SSW.2016-10","DOIUrl":"https://doi.org/10.21437/SSW.2016-10","url":null,"abstract":"This work presents a study on the suitability of prosodic andacoustic features, with a special focus on i-vectors, in expressivespeech analysis and synthesis. For each utterance of two dif-ferent databases, a laboratory recorded emotional acted speech,and an audiobook, several prosodic and acoustic features are ex-tracted. Among them, i-vectors are built not only on the MFCCbase, but also on F0, power and syllable durations. Then, un-supervised clustering is performed using different feature com-binations. The resulting clusters are evaluated calculating clus-ter entropy for labeled portions of the databases. Additionally,synthetic voices are trained, applying speaker adaptive training,from the clusters built from the audiobook. The voices are eval-uated in a perceptual test where the participants have to edit anaudiobook paragraph using the synthetic voices.The objective results suggest that i-vectors are very use-ful for the audiobook, where different speakers (book charac-ters) are imitated. On the other hand, for the laboratory record-ings, traditional prosodic features outperform i-vectors. Also,a closer analysis of the created clusters suggest that differentspeakers use different prosodic and acoustic means to conveyemotions. The perceptual results suggest that the proposed i-vector based feature combinations can be used for audiobookclustering and voice training.","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring 基于统计语音转换和外部噪声监测的噪声抑制的不可听杂音增强

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-9

Y. Tajiri, T. Toda

{"title":"Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring","authors":"Y. Tajiri, T. Toda","doi":"10.21437/SSW.2016-9","DOIUrl":"https://doi.org/10.21437/SSW.2016-9","url":null,"abstract":"This paper presents a method for making nonaudible murmur (NAM) enhancement based on statistical voice conversion (VC) robust against external noise. NAM, which is an extremely soft whispered voice, is a promising medium for silent speech communication thanks to its faint volume. Although such a soft voice can still be detected with a special body-conductive microphone, its quality significantly degrades compared to that of air-conductive voices. It has been shown that the statistical VC technique is capable of significantly improving quality of NAM by converting it into the air-conductive voices. However, this technique is not helpful under noisy conditions because a detected NAM signal easily suffers from external noise, and acoustic mismatches are caused between such a noisy NAM signal and a previously trained conversion model. To address this issue, in this paper we apply our proposed noise suppression method based on external noise monitoring to the statistical NAM enhancement. Moreover, a known noise superimposition method is further applied in order to alleviate the effects of residual noise components on the conversion accuracy. The experimental results demonstrate that the proposed method yields significant improvements in the conversion accuracy compared to the conventional method.","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114400296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Jerk Minimization for Acoustic-To-Articulatory Inversion 声学-发音反转的震动最小化

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-14

Avni Rajpal, H. Patil

引用次数: 1

Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model 基于α-插值模型的多输出RNN-LSTM多扬声器语音合成

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-19

Santiago Pascual, A. Bonafonte

引用次数: 10

Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression 基于谱特征和支持向量回归的合成语音非侵入性质量评估

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-21

Meet H. Soni, H. Patil

{"title":"Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression","authors":"Meet H. Soni, H. Patil","doi":"10.21437/SSW.2016-21","DOIUrl":"https://doi.org/10.21437/SSW.2016-21","url":null,"abstract":"In this paper, we propose a new quality assessment method for synthesized speech. Unlike previous approaches which uses Hidden Markov Model (HMM) trained on natural utterances as a reference model to predict the quality of synthesized speech, proposed approach uses knowledge about synthesized speech while training the model. The previous approach has been successfully applied in the quality assessment of synthesized speech for the German language. However, it gave poor results for English language databases such as Blizzard Challenge 2008 and 2009 databases. The problem of quality assessment of synthesized speech is posed as a regression problem. The mapping between statistical properties of spectral features extracted from the speech signal and corresponding speech quality score (MOS) was found using Support Vector Regression (SVR). All the experiments were done on Blizzard Challenge Databases of the year 2008, 2009, 2010 and 2012. The results of experiments show that by including knowledge about synthesized speech while training, the performance of quality assessment system can be improved. Moreover, the accuracy of quality assessment system heavily depends on the kind of synthesis system used for signal generation. On Blizzard 2008 and 2009 database, proposed approach gives correlation of 0.28 and 0.49 , respectively, for about 17 % data used in training. Previous approach gives correlation of 0.3 and 0.09 , respectively, using spectral features. For Blizzard 2012 database, proposed approach gives correlation of 0.8 by using 12 % of available data in training.","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116724346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Mandarin Prosodic Phrase Prediction based on Syntactic Trees 基于句法树的汉语韵律短语预测

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-26

Zhengchen Zhang, Fuxiang Wu, Chenyu Yang, M. Dong, Fu-qiu Zhou

引用次数: 10

Automatic, model-based detection of pause-less phrase boundaries from fundamental frequency and duration features 从基本频率和持续时间特征中自动，基于模型的无停顿短语边界检测

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI: 10.21437/SSW.2016-1

Mahsa Sadat Elyasi Langarani, J. V. Santen

引用次数: 0