{"title":"CNN-based Note Onset Detection using Synthetic Data Augmentation","authors":"Mina Mounir, P. Karsmakers, T. Waterschoot","doi":"10.23919/Eusipco47968.2020.9287621","DOIUrl":null,"url":null,"abstract":"Detecting the onset of notes in music excerpts is a fundamental problem in many music signal processing tasks, including analysis, synthesis, and information retrieval. When addressing the note onset detection (NOD) problem using a data-driven methodology, a major challenge is the availability and quality of labeled datasets used for both model training/tuning and evaluation. As most of the available datasets are manually annotated, the amount of annotated music excerpts is limited and the annotation strategy and quality varies across data sets. To counter both problems, in this paper we propose to use semi-synthetic datasets where the music excerpts are mixes of isolated note recordings. The advantage resides in the annotations being automatically generated while mixing the notes, as isolated note onsets are straightforward to detect using a simple energy measure. A semi-synthetic dataset is used in this work for augmenting a real piano dataset when training a convolutional Neural Network (CNN) with three novel model training strategies. Training the CNN on a semi-synthetic dataset and retraining only the CNN classification layers on a real dataset results in higher average F1-score (F1) scores with lower variance.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"89 1","pages":"171-175"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Detecting the onset of notes in music excerpts is a fundamental problem in many music signal processing tasks, including analysis, synthesis, and information retrieval. When addressing the note onset detection (NOD) problem using a data-driven methodology, a major challenge is the availability and quality of labeled datasets used for both model training/tuning and evaluation. As most of the available datasets are manually annotated, the amount of annotated music excerpts is limited and the annotation strategy and quality varies across data sets. To counter both problems, in this paper we propose to use semi-synthetic datasets where the music excerpts are mixes of isolated note recordings. The advantage resides in the annotations being automatically generated while mixing the notes, as isolated note onsets are straightforward to detect using a simple energy measure. A semi-synthetic dataset is used in this work for augmenting a real piano dataset when training a convolutional Neural Network (CNN) with three novel model training strategies. Training the CNN on a semi-synthetic dataset and retraining only the CNN classification layers on a real dataset results in higher average F1-score (F1) scores with lower variance.