Enhancing timbre model using MFCC and its time derivatives for music similarity estimation

2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) Pub Date : 2012-10-18 DOI:10.5281/ZENODO.42839

Franz A. de Leon, K. Martinez

引用次数: 16

Abstract

One of the popular methods for content-based music similarity estimation is to model timbre with MFCC as a single multivariate Gaussian with full covariance matrix, then use symmetric Kullback-Leibler divergence. From the field of speech recognition, we propose to use the same approach on the MFCCs' time derivatives to enhance the timbre model. The Gaussian models for the delta and acceleration coefficients are used to create their respective distance matrix. The distance matrices are then combined linearly to form a full distance matrix for music similarity estimation. In our experiments on two datasets, our novel approach performs better than using MFCC alone. Moreover, performing genre classification using k-NN showed that the accuracies obtained are already close to the state-of-the-art.

查看原文本刊更多论文

利用MFCC及其时间导数增强音色模型进行音乐相似度估计

基于内容的音乐相似度估计的常用方法之一是将MFCC建模为具有全协方差矩阵的单多元高斯，然后使用对称Kullback-Leibler散度。在语音识别领域，我们建议对MFCCs的时间导数使用相同的方法来增强音色模型。使用高斯模型来创建delta和加速度系数各自的距离矩阵。然后将距离矩阵线性组合，形成一个完整的距离矩阵，用于音乐相似度估计。在两个数据集上的实验中，我们的新方法比单独使用MFCC表现更好。此外，使用k-NN进行类型分类表明，获得的准确率已经接近最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量