Sparse nonlinear representation for voice conversion

Toru Nakashika, T. Takiguchi, Y. Ariki
{"title":"Sparse nonlinear representation for voice conversion","authors":"Toru Nakashika, T. Takiguchi, Y. Ariki","doi":"10.1109/ICME.2015.7177437","DOIUrl":null,"url":null,"abstract":"In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be broadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voice-conversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.
语音转换的稀疏非线性表示
在语音转换中,基于稀疏表示的方法最近受到了人们的关注,因为相对而言,它们不受过度拟合或过度平滑问题的影响。在这些方法中,语音转换是通过估计一个稀疏向量来实现的,该向量根据输入向量和源说话人的字典的匹配来计算,确定应该使用目标说话人的哪些字典。基于稀疏表示的语音转换方法大致可以分为两种方法:1)使用训练数据中的原始声学特征作为并行字典的方法;2)从训练数据中训练并行字典的方法。在我们的方法中,我们遵循后一种方法,并使用具有稀疏约束的联合密度受限玻尔兹曼机系统地估计并行字典。通过语音转换实验,我们将该方法与传统的基于高斯混合模型(GMM)的方法和基于稀疏表示的非负矩阵分解(NMF)方法进行了比较,证实了该方法的高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信