基于乐谱的合唱音乐源分离

International Society for Music Information Retrieval Conference Pub Date : 2020-10-11 DOI:10.5281/ZENODO.4245412

M. Gover, P. Depalle

{"title":"基于乐谱的合唱音乐源分离","authors":"M. Gover, P. Depalle","doi":"10.5281/ZENODO.4245412","DOIUrl":null,"url":null,"abstract":"Choral music recordings are a particularly challenging target for source separation due to the choral blend and the inherent acoustical complexity of the ‘choral timbre’. Due to the scarcity of publicly available multi-track choir recordings, we create a dataset of synthesized Bach chorales. We apply data augmentation to alter the chorales so that they more faithfully represent music from a broader range of choral genres. For separation we employ Wave-U-Net, a time-domain convolutional neural network (CNN) originally proposed for vocals and accompaniment separation. We show that Wave-U-Net outperforms a baseline implemented using score-informed NMF (non-negative matrix factorization). We introduce score-informed Wave-U-Net to incorporate the musical score into the separation process. We experiment with different score conditioning methods and show that conditioning on the score leads to improved separation results. We propose a ‘score-guided’ model variant in which separation is guided by the score alone, bypassing the need to specify the identity of the extracted source. Finally, we evaluate our models (trained on synthetic data only) on real choir recordings and find that in the absence of a large training set of real recordings, NMF still performs better than Wave-U-Net in this setting. To our knowledge, this paper is the first to study source separation of choral music.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Score-Informed Source Separation of Choral Music\",\"authors\":\"M. Gover, P. Depalle\",\"doi\":\"10.5281/ZENODO.4245412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Choral music recordings are a particularly challenging target for source separation due to the choral blend and the inherent acoustical complexity of the ‘choral timbre’. Due to the scarcity of publicly available multi-track choir recordings, we create a dataset of synthesized Bach chorales. We apply data augmentation to alter the chorales so that they more faithfully represent music from a broader range of choral genres. For separation we employ Wave-U-Net, a time-domain convolutional neural network (CNN) originally proposed for vocals and accompaniment separation. We show that Wave-U-Net outperforms a baseline implemented using score-informed NMF (non-negative matrix factorization). We introduce score-informed Wave-U-Net to incorporate the musical score into the separation process. We experiment with different score conditioning methods and show that conditioning on the score leads to improved separation results. We propose a ‘score-guided’ model variant in which separation is guided by the score alone, bypassing the need to specify the identity of the extracted source. Finally, we evaluate our models (trained on synthetic data only) on real choir recordings and find that in the absence of a large training set of real recordings, NMF still performs better than Wave-U-Net in this setting. To our knowledge, this paper is the first to study source separation of choral music.\",\"PeriodicalId\":309903,\"journal\":{\"name\":\"International Society for Music Information Retrieval Conference\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Society for Music Information Retrieval Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5281/ZENODO.4245412\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Society for Music Information Retrieval Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.4245412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

由于合唱混音和“合唱音色”固有的声学复杂性，合唱音乐录音是一个特别具有挑战性的源分离目标。由于缺乏公开可用的多轨合唱团录音，我们创建了一个合成巴赫合唱团的数据集。我们应用数据增强来改变合唱，使他们更忠实地代表音乐从更广泛的合唱流派。对于分离，我们使用Wave-U-Net，这是一种时域卷积神经网络(CNN)，最初是为人声和伴奏分离而提出的。我们表明，Wave-U-Net优于使用分数通知NMF(非负矩阵分解)实现的基线。我们引入了分数通知Wave-U-Net，将乐谱纳入分离过程。我们对不同的分数调节方法进行了实验，结果表明，对分数进行调节可以改善分离结果。我们提出了一个“分数引导”模型变体，其中分离仅由分数引导，而不需要指定提取源的身份。最后，我们在真实的合唱团录音上评估了我们的模型(仅在合成数据上训练)，发现在没有大量真实录音训练集的情况下，NMF在这种情况下仍然比Wave-U-Net表现得更好。据我们所知，本文首次对合唱音乐的音源分离进行了研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Score-Informed Source Separation of Choral Music

Choral music recordings are a particularly challenging target for source separation due to the choral blend and the inherent acoustical complexity of the ‘choral timbre’. Due to the scarcity of publicly available multi-track choir recordings, we create a dataset of synthesized Bach chorales. We apply data augmentation to alter the chorales so that they more faithfully represent music from a broader range of choral genres. For separation we employ Wave-U-Net, a time-domain convolutional neural network (CNN) originally proposed for vocals and accompaniment separation. We show that Wave-U-Net outperforms a baseline implemented using score-informed NMF (non-negative matrix factorization). We introduce score-informed Wave-U-Net to incorporate the musical score into the separation process. We experiment with different score conditioning methods and show that conditioning on the score leads to improved separation results. We propose a ‘score-guided’ model variant in which separation is guided by the score alone, bypassing the need to specify the identity of the extracted source. Finally, we evaluate our models (trained on synthetic data only) on real choir recordings and find that in the absence of a large training set of real recordings, NMF still performs better than Wave-U-Net in this setting. To our knowledge, this paper is the first to study source separation of choral music.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Society for Music Information Retrieval Conference

自引率

0.00%

发文量