基于cnn的基于合成数据增强的音符起始检测

2020 28th European Signal Processing Conference (EUSIPCO) Pub Date : 2021-01-24 DOI:10.23919/Eusipco47968.2020.9287621

Mina Mounir, P. Karsmakers, T. Waterschoot

{"title":"基于cnn的基于合成数据增强的音符起始检测","authors":"Mina Mounir, P. Karsmakers, T. Waterschoot","doi":"10.23919/Eusipco47968.2020.9287621","DOIUrl":null,"url":null,"abstract":"Detecting the onset of notes in music excerpts is a fundamental problem in many music signal processing tasks, including analysis, synthesis, and information retrieval. When addressing the note onset detection (NOD) problem using a data-driven methodology, a major challenge is the availability and quality of labeled datasets used for both model training/tuning and evaluation. As most of the available datasets are manually annotated, the amount of annotated music excerpts is limited and the annotation strategy and quality varies across data sets. To counter both problems, in this paper we propose to use semi-synthetic datasets where the music excerpts are mixes of isolated note recordings. The advantage resides in the annotations being automatically generated while mixing the notes, as isolated note onsets are straightforward to detect using a simple energy measure. A semi-synthetic dataset is used in this work for augmenting a real piano dataset when training a convolutional Neural Network (CNN) with three novel model training strategies. Training the CNN on a semi-synthetic dataset and retraining only the CNN classification layers on a real dataset results in higher average F1-score (F1) scores with lower variance.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"89 1","pages":"171-175"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"CNN-based Note Onset Detection using Synthetic Data Augmentation\",\"authors\":\"Mina Mounir, P. Karsmakers, T. Waterschoot\",\"doi\":\"10.23919/Eusipco47968.2020.9287621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting the onset of notes in music excerpts is a fundamental problem in many music signal processing tasks, including analysis, synthesis, and information retrieval. When addressing the note onset detection (NOD) problem using a data-driven methodology, a major challenge is the availability and quality of labeled datasets used for both model training/tuning and evaluation. As most of the available datasets are manually annotated, the amount of annotated music excerpts is limited and the annotation strategy and quality varies across data sets. To counter both problems, in this paper we propose to use semi-synthetic datasets where the music excerpts are mixes of isolated note recordings. The advantage resides in the annotations being automatically generated while mixing the notes, as isolated note onsets are straightforward to detect using a simple energy measure. A semi-synthetic dataset is used in this work for augmenting a real piano dataset when training a convolutional Neural Network (CNN) with three novel model training strategies. Training the CNN on a semi-synthetic dataset and retraining only the CNN classification layers on a real dataset results in higher average F1-score (F1) scores with lower variance.\",\"PeriodicalId\":6705,\"journal\":{\"name\":\"2020 28th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"89 1\",\"pages\":\"171-175\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/Eusipco47968.2020.9287621\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在许多音乐信号处理任务中，包括分析、合成和信息检索，检测音乐片段中音符的开始是一个基本问题。当使用数据驱动的方法解决音符起始检测(NOD)问题时，主要的挑战是用于模型训练/调优和评估的标记数据集的可用性和质量。由于大多数可用的数据集都是手动注释的，因此注释的音乐节选数量有限，并且注释策略和质量因数据集而异。为了解决这两个问题，在本文中，我们建议使用半合成数据集，其中音乐摘录是孤立音符录音的混合。它的优点在于，注释是在混合音符时自动生成的，因为使用简单的能量度量可以直接检测到孤立的音符发作。本研究使用半合成数据集来增强真实钢琴数据集，并使用三种新颖的模型训练策略训练卷积神经网络(CNN)。在半合成数据集上训练CNN，在真实数据集上只训练CNN分类层，结果是平均F1得分更高，方差更小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CNN-based Note Onset Detection using Synthetic Data Augmentation

Detecting the onset of notes in music excerpts is a fundamental problem in many music signal processing tasks, including analysis, synthesis, and information retrieval. When addressing the note onset detection (NOD) problem using a data-driven methodology, a major challenge is the availability and quality of labeled datasets used for both model training/tuning and evaluation. As most of the available datasets are manually annotated, the amount of annotated music excerpts is limited and the annotation strategy and quality varies across data sets. To counter both problems, in this paper we propose to use semi-synthetic datasets where the music excerpts are mixes of isolated note recordings. The advantage resides in the annotations being automatically generated while mixing the notes, as isolated note onsets are straightforward to detect using a simple energy measure. A semi-synthetic dataset is used in this work for augmenting a real piano dataset when training a convolutional Neural Network (CNN) with three novel model training strategies. Training the CNN on a semi-synthetic dataset and retraining only the CNN classification layers on a real dataset results in higher average F1-score (F1) scores with lower variance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 28th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量