Voice quality enhancement for vocal tract rehabilitation

2018 3rd Biennial South African Biomedical Engineering Conference (SAIBMEC) Pub Date : 2018-04-01 DOI:10.1109/SAIBMEC.2018.8363197

Bianca Sutcliffe, L. Wiggins, D. Rubin, V. Aharonson

{"title":"Voice quality enhancement for vocal tract rehabilitation","authors":"Bianca Sutcliffe, L. Wiggins, D. Rubin, V. Aharonson","doi":"10.1109/SAIBMEC.2018.8363197","DOIUrl":null,"url":null,"abstract":"Vocal rehabilitation devices used by patients after Laryngectomy produce an unnatural sounding speech. Our study aims at increasing the quality of these synthetically generated voices by implementing human-like characteristics. A simplified source filter model, linear predictive coding coefficients and line spectral frequencies were used to model the vocal tract and manipulate the acoustic features of their resulting speech. Two different mapping functions were employed to convert between the features of synthetically generated voice and those of a human voice: A Gaussian mixture model and a linear regression model. The models were trained on a set of 50 human and 50 synthetic voice utterances. Both mapping functions yielded significant changes in the transformed synthetic voices and their spectra were similar to the human voices. The linear regression model mapping produced slightly better results compared to the Gaussian mixture model mapping. Listeners' tests confirmed this result, but indicated that voices re-synthesized from the transformed model coefficients, improved on the synthetic voice but still sounded unnatural. This may imply that the vocal tract model is lacking in information that produces the subjective perception of “artificial speech”. Future work will investigate an elaborate model which will include the speech production excitation and radiation signals and the transformation of their features. These models have the potential to improve the conversion of synthetically generated electrolarynx voice into human sounding one.","PeriodicalId":165912,"journal":{"name":"2018 3rd Biennial South African Biomedical Engineering Conference (SAIBMEC)","volume":"75 2-3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd Biennial South African Biomedical Engineering Conference (SAIBMEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAIBMEC.2018.8363197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Vocal rehabilitation devices used by patients after Laryngectomy produce an unnatural sounding speech. Our study aims at increasing the quality of these synthetically generated voices by implementing human-like characteristics. A simplified source filter model, linear predictive coding coefficients and line spectral frequencies were used to model the vocal tract and manipulate the acoustic features of their resulting speech. Two different mapping functions were employed to convert between the features of synthetically generated voice and those of a human voice: A Gaussian mixture model and a linear regression model. The models were trained on a set of 50 human and 50 synthetic voice utterances. Both mapping functions yielded significant changes in the transformed synthetic voices and their spectra were similar to the human voices. The linear regression model mapping produced slightly better results compared to the Gaussian mixture model mapping. Listeners' tests confirmed this result, but indicated that voices re-synthesized from the transformed model coefficients, improved on the synthetic voice but still sounded unnatural. This may imply that the vocal tract model is lacking in information that produces the subjective perception of “artificial speech”. Future work will investigate an elaborate model which will include the speech production excitation and radiation signals and the transformation of their features. These models have the potential to improve the conversion of synthetically generated electrolarynx voice into human sounding one.

查看原文本刊更多论文

声道康复的语音质量增强

喉切除术后患者使用的声音康复装置会产生不自然的声音。我们的研究旨在通过实现类似人类的特征来提高这些合成声音的质量。采用简化的源滤波器模型、线性预测编码系数和线谱频率对声道进行建模，并对生成的语音进行声学特征处理。采用高斯混合模型和线性回归模型两种不同的映射函数将合成语音的特征与人类语音的特征进行转换。这些模型接受了50个人类和50个合成语音的训练。两种映射函数在转换后的合成声音中都产生了显著的变化，并且它们的频谱与人声相似。与高斯混合模型映射相比，线性回归模型映射产生了稍好的结果。听众的测试证实了这一结果，但表明从转换后的模型系数重新合成的声音比合成的声音更好，但听起来仍然不自然。这可能意味着声道模型缺乏产生“人工语音”主观感知的信息。未来的工作将研究一个复杂的模型，该模型将包括语音产生、激励和辐射信号及其特征的转换。这些模型有潜力改善合成的电喉音向人声的转换。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 3rd Biennial South African Biomedical Engineering Conference (SAIBMEC)

自引率

0.00%

发文量