基于广义命令响应模型和分数条件变分自编码器的歌唱基频轮廓生成

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2021-10-25 DOI:10.1109/mlsp52302.2021.9596428

Shogo Seki, Haruka Taga, T. Toda

{"title":"基于广义命令响应模型和分数条件变分自编码器的歌唱基频轮廓生成","authors":"Shogo Seki, Haruka Taga, T. Toda","doi":"10.1109/mlsp52302.2021.9596428","DOIUrl":null,"url":null,"abstract":"This paper proposes a method for achieving physically motivated and interpretable control of fundamental frequency (F0) contour generation in singing aid systems for laryngectomees. Recently proposed variational autoencoder (VAE)-based method, VAE-SPACE, has successfully generated singing F0 contours from musical scores. However, VAE-SPACE can generate physically deviated F0 contours. Moreover, to represent fluctuations in F0 contours, VAE-SPACE requires manual adjustment of noise components used as the input with musical scores. To address these issues, the proposed method 1) introduces a generalized command-response (GCR) model to represent an F0 contour as an approximation of a physical F0 production mechanism, and 2) employs a conditional VAE (CVAE) to treat musical scores and the noise components separately. The experimental results reveal that the proposed method achieves comparable performance as VAE-SPACE without the manual adjustment of noise components and makes it possible to control F0 contours more intuitively by using the trained GCR model.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Singing Fundamental Frequency Contour Generation Using Generalized Command-Response Model and Score-Conditional Variational Autoencoder\",\"authors\":\"Shogo Seki, Haruka Taga, T. Toda\",\"doi\":\"10.1109/mlsp52302.2021.9596428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a method for achieving physically motivated and interpretable control of fundamental frequency (F0) contour generation in singing aid systems for laryngectomees. Recently proposed variational autoencoder (VAE)-based method, VAE-SPACE, has successfully generated singing F0 contours from musical scores. However, VAE-SPACE can generate physically deviated F0 contours. Moreover, to represent fluctuations in F0 contours, VAE-SPACE requires manual adjustment of noise components used as the input with musical scores. To address these issues, the proposed method 1) introduces a generalized command-response (GCR) model to represent an F0 contour as an approximation of a physical F0 production mechanism, and 2) employs a conditional VAE (CVAE) to treat musical scores and the noise components separately. The experimental results reveal that the proposed method achieves comparable performance as VAE-SPACE without the manual adjustment of noise components and makes it possible to control F0 contours more intuitively by using the trained GCR model.\",\"PeriodicalId\":156116,\"journal\":{\"name\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlsp52302.2021.9596428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种方法，以实现物理动机和可解释的控制基频(F0)轮廓的产生在喉切除术的歌唱辅助系统。最近提出的基于变分自编码器(VAE)的VAE- space方法已经成功地从乐谱中生成了歌唱F0轮廓。然而，vee - space可以生成物理上偏离的F0轮廓。此外，为了表示F0轮廓的波动，vee - space需要手动调整作为乐谱输入的噪声分量。为了解决这些问题，提出的方法1)引入广义命令响应(GCR)模型来表示F0轮廓，作为物理F0产生机制的近似值，2)采用条件VAE (CVAE)分别处理乐谱和噪声成分。实验结果表明，该方法在无需手动调整噪声分量的情况下达到了与vee - space相当的性能，并且可以使用训练好的GCR模型更直观地控制F0轮廓。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Singing Fundamental Frequency Contour Generation Using Generalized Command-Response Model and Score-Conditional Variational Autoencoder

This paper proposes a method for achieving physically motivated and interpretable control of fundamental frequency (F0) contour generation in singing aid systems for laryngectomees. Recently proposed variational autoencoder (VAE)-based method, VAE-SPACE, has successfully generated singing F0 contours from musical scores. However, VAE-SPACE can generate physically deviated F0 contours. Moreover, to represent fluctuations in F0 contours, VAE-SPACE requires manual adjustment of noise components used as the input with musical scores. To address these issues, the proposed method 1) introduces a generalized command-response (GCR) model to represent an F0 contour as an approximation of a physical F0 production mechanism, and 2) employs a conditional VAE (CVAE) to treat musical scores and the noise components separately. The experimental results reveal that the proposed method achieves comparable performance as VAE-SPACE without the manual adjustment of noise components and makes it possible to control F0 contours more intuitively by using the trained GCR model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量