{"title":"A full band adaptive Harmonic Model based Speaker Identity Transformation using Radial Basis Function","authors":"Ankita N. Chadha, J. Nirmal","doi":"10.1109/ISCO.2017.7855985","DOIUrl":null,"url":null,"abstract":"Speaker Transformation adapts the speaker dependent characteristics of the source speaker according to that of a target speaker, so that it is perceived like the target speaker. Speaker Transformation is generally carried out using speech analysis-synthesis system. The full-band adaptive Harmonic Model (a-HM) based analysis-synthesis has ability to produce a high quality resynthesized speech. Thus inn this paper, a full band a-HM is proposed to represent the speaker dependent parameters of the source and target speech signal. The Radial Basis Function (RBF) neural network is developed to capture non-linear relationship between source and target a-HM based features. In the state of art method, Line Spectral Frequency (LSF) is used to represent the vocal tract and LP-residual for the glottal excitation of the speech signal. The RBF is used to map the LSF of source speaker to that of the target speakers and state of art residual selection method is used for modification of source residual to that of target residual. The performance of the proposed a-HM based speaker transformation is compared with the state of the art features using various objective and subjective measures. The results reveal that the a-HM feature based speaker transformation performs profoundly well in contrast to the state of the art technique.","PeriodicalId":321113,"journal":{"name":"2017 11th International Conference on Intelligent Systems and Control (ISCO)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 11th International Conference on Intelligent Systems and Control (ISCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCO.2017.7855985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Speaker Transformation adapts the speaker dependent characteristics of the source speaker according to that of a target speaker, so that it is perceived like the target speaker. Speaker Transformation is generally carried out using speech analysis-synthesis system. The full-band adaptive Harmonic Model (a-HM) based analysis-synthesis has ability to produce a high quality resynthesized speech. Thus inn this paper, a full band a-HM is proposed to represent the speaker dependent parameters of the source and target speech signal. The Radial Basis Function (RBF) neural network is developed to capture non-linear relationship between source and target a-HM based features. In the state of art method, Line Spectral Frequency (LSF) is used to represent the vocal tract and LP-residual for the glottal excitation of the speech signal. The RBF is used to map the LSF of source speaker to that of the target speakers and state of art residual selection method is used for modification of source residual to that of target residual. The performance of the proposed a-HM based speaker transformation is compared with the state of the art features using various objective and subjective measures. The results reveal that the a-HM feature based speaker transformation performs profoundly well in contrast to the state of the art technique.