Improving Singing Voice Separation Using Attribute-Aware Deep Network

R. Swaminathan, Alexander Lerch
{"title":"Improving Singing Voice Separation Using Attribute-Aware Deep Network","authors":"R. Swaminathan, Alexander Lerch","doi":"10.1109/MMRP.2019.8665379","DOIUrl":null,"url":null,"abstract":"Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ‘oracle’ input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and nonvocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality.","PeriodicalId":441469,"journal":{"name":"2019 International Workshop on Multilayer Music Representation and Processing (MMRP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Workshop on Multilayer Music Representation and Processing (MMRP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMRP.2019.8665379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ‘oracle’ input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and nonvocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality.
利用属性感知深度网络改进歌唱声音分离
歌唱声音分离(SVS)试图将主要的歌唱声音从复调音乐混合物中分离出来。在本文中,我们研究了引入属性特定信息的效果,即帧级声乐活动信息作为增强特征输入到执行分离的深度神经网络中。我们的研究考虑了两种类型的输入,即基于事实的“神谕”输入和由最先进的模型提取的标签,用于在复调音乐中检测歌唱语音活动。我们表明,被告知发声活动的分离网络学会了区分发声和非发声区域。这样的网络因此减少干扰和伪影比网络不可知的这方面的信息。MIR1K数据集上的结果表明,将声音活动告知分离网络可以在所有用于评估分离质量的措施中一致地提高分离结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信