Ilias Theodorakopoulos, G. Economou, S. Fotopoulos
{"title":"Unsupervised music segmentation via multi-scale processing of compressive features' representation","authors":"Ilias Theodorakopoulos, G. Economou, S. Fotopoulos","doi":"10.1109/ICDSP.2013.6622772","DOIUrl":null,"url":null,"abstract":"We present an automated method for unsupervised detection of structural boundaries in musical recordings. The proposed method utilizes a compressed representation of features capturing timbre and chroma, in an 1-D time series derived via PCA. Time delay embedding and multi-scale comparison using the Wald-Wolfowitz statistical test are incorporated in order to calculate a Self Dissimilarity Matrix. A novelty curve is estimated by convolving an appropriate kernel along the main diagonal of the matrix, while the structural boundaries are located on the local maxima of the derived curve. We evaluate the proposed method on a popular dataset, using two different ground truth annotations. We demonstrate that the 1-D compressed representation of features contains enough information in order to detect boundaries with high precision, outperforming several methods from the literature.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 18th International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2013.6622772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We present an automated method for unsupervised detection of structural boundaries in musical recordings. The proposed method utilizes a compressed representation of features capturing timbre and chroma, in an 1-D time series derived via PCA. Time delay embedding and multi-scale comparison using the Wald-Wolfowitz statistical test are incorporated in order to calculate a Self Dissimilarity Matrix. A novelty curve is estimated by convolving an appropriate kernel along the main diagonal of the matrix, while the structural boundaries are located on the local maxima of the derived curve. We evaluate the proposed method on a popular dataset, using two different ground truth annotations. We demonstrate that the 1-D compressed representation of features contains enough information in order to detect boundaries with high precision, outperforming several methods from the literature.