基于支持向量机的语音说话人特征变换

15th International Conference on Advanced Computing and Communications (ADCOM 2007) Pub Date : 2007-12-18 DOI:10.1109/ADCOM.2007.124

K.S. Rao, S. Koolagudi

{"title":"基于支持向量机的语音说话人特征变换","authors":"K.S. Rao, S. Koolagudi","doi":"10.1109/ADCOM.2007.124","DOIUrl":null,"url":null,"abstract":"In this paper we propose support vector machines (S VM) for transforming the speaker characteristics of the speech. Speaker characteristics are mainly influenced by the behavioural characteristics (prosody) of the speaker, characteristics of the vocal tract system and the excitation source. In this work speaker transformation indicates, modifying the speaker characteristics of the speech according to the desired speaker, and preserving the underlying message (sequence of sound units, i.e., text) same as in the original speech. This is performed by deriving the mapping functions for transforming the vocal tract characteristics and prosodic characteristics. SVMs are explored for deriving these mapping functions. The prosodic parameters and the characteristics of the vocal tract system and the excitation source of the target speaker are obtained from the output of the mapping functions. The manipulations of the prosodic parameters (durational characteristics, pitch contour (intonation pattern) and intensity patterns) are achieved by manipulating the linear prediction (LP) residual with the help of the knowledge of the instants of significant excitation. The modified LP residual is used to excite the time varying filter. The filter parameters are updated according to the desired vocal tract characteristics. The target speaker's speech is synthesized and evaluated using listening tests. The results of the listening tests indicate that the proposed mapping functions using SVMs provide the better speaker transformation compared to the earlier methods proposed by the author.","PeriodicalId":185608,"journal":{"name":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Transformation of Speaker Characteristics in Speech Using Support Vector Machines\",\"authors\":\"K.S. Rao, S. Koolagudi\",\"doi\":\"10.1109/ADCOM.2007.124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose support vector machines (S VM) for transforming the speaker characteristics of the speech. Speaker characteristics are mainly influenced by the behavioural characteristics (prosody) of the speaker, characteristics of the vocal tract system and the excitation source. In this work speaker transformation indicates, modifying the speaker characteristics of the speech according to the desired speaker, and preserving the underlying message (sequence of sound units, i.e., text) same as in the original speech. This is performed by deriving the mapping functions for transforming the vocal tract characteristics and prosodic characteristics. SVMs are explored for deriving these mapping functions. The prosodic parameters and the characteristics of the vocal tract system and the excitation source of the target speaker are obtained from the output of the mapping functions. The manipulations of the prosodic parameters (durational characteristics, pitch contour (intonation pattern) and intensity patterns) are achieved by manipulating the linear prediction (LP) residual with the help of the knowledge of the instants of significant excitation. The modified LP residual is used to excite the time varying filter. The filter parameters are updated according to the desired vocal tract characteristics. The target speaker's speech is synthesized and evaluated using listening tests. The results of the listening tests indicate that the proposed mapping functions using SVMs provide the better speaker transformation compared to the earlier methods proposed by the author.\",\"PeriodicalId\":185608,\"journal\":{\"name\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ADCOM.2007.124\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ADCOM.2007.124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

本文提出了一种基于支持向量机的语音特征变换方法。说话人的特征主要受说话人的行为特征(韵律)、声道系统特征和激励源的影响。在这项工作中，说话人变换是指根据所期望的说话人修改语音的说话人特征，并保留与原始语音相同的底层信息(声音单位序列，即文本)。这是通过推导转换声道特征和韵律特征的映射函数来实现的。探索支持向量机来推导这些映射函数。从映射函数的输出中得到目标说话人的韵律参数、声道系统特征和激励源。韵律参数(持续时间特征，音高轮廓(语调模式)和强度模式)的操纵是通过在显著激励时刻知识的帮助下操纵线性预测(LP)残差来实现的。利用改进的LP残差激励时变滤波器。根据期望的声道特征更新滤波器参数。通过听力测试对目标说话者的讲话进行综合和评估。听力测试结果表明，与作者先前提出的方法相比，本文提出的基于支持向量机的映射函数提供了更好的说话人转换。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Transformation of Speaker Characteristics in Speech Using Support Vector Machines

In this paper we propose support vector machines (S VM) for transforming the speaker characteristics of the speech. Speaker characteristics are mainly influenced by the behavioural characteristics (prosody) of the speaker, characteristics of the vocal tract system and the excitation source. In this work speaker transformation indicates, modifying the speaker characteristics of the speech according to the desired speaker, and preserving the underlying message (sequence of sound units, i.e., text) same as in the original speech. This is performed by deriving the mapping functions for transforming the vocal tract characteristics and prosodic characteristics. SVMs are explored for deriving these mapping functions. The prosodic parameters and the characteristics of the vocal tract system and the excitation source of the target speaker are obtained from the output of the mapping functions. The manipulations of the prosodic parameters (durational characteristics, pitch contour (intonation pattern) and intensity patterns) are achieved by manipulating the linear prediction (LP) residual with the help of the knowledge of the instants of significant excitation. The modified LP residual is used to excite the time varying filter. The filter parameters are updated according to the desired vocal tract characteristics. The target speaker's speech is synthesized and evaluated using listening tests. The results of the listening tests indicate that the proposed mapping functions using SVMs provide the better speaker transformation compared to the earlier methods proposed by the author.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

15th International Conference on Advanced Computing and Communications (ADCOM 2007)

自引率

0.00%

发文量