利用属性感知深度网络改进歌唱声音分离

2019 International Workshop on Multilayer Music Representation and Processing (MMRP) Pub Date : 2019-01-01 DOI:10.1109/MMRP.2019.8665379

R. Swaminathan, Alexander Lerch

{"title":"利用属性感知深度网络改进歌唱声音分离","authors":"R. Swaminathan, Alexander Lerch","doi":"10.1109/MMRP.2019.8665379","DOIUrl":null,"url":null,"abstract":"Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ‘oracle’ input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and nonvocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality.","PeriodicalId":441469,"journal":{"name":"2019 International Workshop on Multilayer Music Representation and Processing (MMRP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Improving Singing Voice Separation Using Attribute-Aware Deep Network\",\"authors\":\"R. Swaminathan, Alexander Lerch\",\"doi\":\"10.1109/MMRP.2019.8665379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ‘oracle’ input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and nonvocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality.\",\"PeriodicalId\":441469,\"journal\":{\"name\":\"2019 International Workshop on Multilayer Music Representation and Processing (MMRP)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Workshop on Multilayer Music Representation and Processing (MMRP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMRP.2019.8665379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Workshop on Multilayer Music Representation and Processing (MMRP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMRP.2019.8665379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

歌唱声音分离(SVS)试图将主要的歌唱声音从复调音乐混合物中分离出来。在本文中，我们研究了引入属性特定信息的效果，即帧级声乐活动信息作为增强特征输入到执行分离的深度神经网络中。我们的研究考虑了两种类型的输入，即基于事实的“神谕”输入和由最先进的模型提取的标签，用于在复调音乐中检测歌唱语音活动。我们表明，被告知发声活动的分离网络学会了区分发声和非发声区域。这样的网络因此减少干扰和伪影比网络不可知的这方面的信息。MIR1K数据集上的结果表明，将声音活动告知分离网络可以在所有用于评估分离质量的措施中一致地提高分离结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Singing Voice Separation Using Attribute-Aware Deep Network

Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ‘oracle’ input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and nonvocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Workshop on Multilayer Music Representation and Processing (MMRP)

自引率

0.00%

发文量