Detecting Cover Songs with Pitch Class Key-Invariant Networks

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2021-10-25 DOI:10.1109/mlsp52302.2021.9596389

K. O'Hanlon, Emmanouil Benetos, S. Dixon

引用次数: 3

Abstract

Deep Learning (DL) has recently been applied successfully to the task of Cover Song Identification (CSI). Meanwhile, neural networks that consider music signal data structure in their design have been developed. In this paper, we propose a Pitch Class Key-Invariant Network, PiCKINet, for CSI. Like some other CSI networks, PiCKINet inputs a Constant-Q Transform (CQT) pitch feature. Unlike other such networks, large multi-octave kernels produce a latent representation with pitch class dimensions that are maintained throughout PiCKINet by key-invariant convolutions. PiCKINet is seen to be more effective, and efficient, than other CQT-based networks. We also propose an extended variant, PiCKINet+, that employs a centre loss penalty, squeeze and excite units, and octave swapping data augmentation. PiCKINet+ shows an improvement of ~17% MAP relative to the well-known CQTNet when tested on a set of ~16K tracks.

查看原文本刊更多论文

用音高类键不变网络检测翻唱歌曲

深度学习(DL)最近成功地应用于翻唱歌曲识别(CSI)任务。同时，在设计中考虑音乐信号数据结构的神经网络也得到了发展。在本文中，我们提出了一个用于CSI的音调类键不变网络，PiCKINet。像其他CSI网络一样，PiCKINet输入一个恒定q变换(CQT)音调特征。与其他此类网络不同，大型多八度核产生具有音高类维度的潜在表示，通过键不变卷积在整个PiCKINet中维护。PiCKINet被认为比其他基于cqt的网络更有效、更高效。我们还提出了一个扩展的变体，PiCKINet+，它采用中心损失惩罚，挤压和激励单元，以及八度交换数据增强。在一组约16K的轨道上测试时，PiCKINet+相对于众所周知的CQTNet显示了约17%的MAP改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量