Sparse nonlinear representation for voice conversion

2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-06-01 DOI:10.1109/ICME.2015.7177437

Toru Nakashika, T. Takiguchi, Y. Ariki

{"title":"Sparse nonlinear representation for voice conversion","authors":"Toru Nakashika, T. Takiguchi, Y. Ariki","doi":"10.1109/ICME.2015.7177437","DOIUrl":null,"url":null,"abstract":"In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.

查看原文本刊更多论文

语音转换的稀疏非线性表示

在语音转换中，基于稀疏表示的方法最近受到了人们的关注，因为相对而言，它们不受过度拟合或过度平滑问题的影响。在这些方法中，语音转换是通过估计一个稀疏向量来实现的，该向量根据输入向量和源说话人的字典的匹配来计算，确定应该使用目标说话人的哪些字典。基于稀疏表示的语音转换方法大致可以分为两种方法:1)使用训练数据中的原始声学特征作为并行字典的方法;2)从训练数据中训练并行字典的方法。在我们的方法中，我们遵循后一种方法，并使用具有稀疏约束的联合密度受限玻尔兹曼机系统地估计并行字典。通过语音转换实验，我们将该方法与传统的基于高斯混合模型(GMM)的方法和基于稀疏表示的非负矩阵分解(NMF)方法进行了比较，证实了该方法的高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量