Music auto-tagging in the long tail: A few-shot approach

arXiv - EE - Audio and Speech Processing Pub Date : 2024-09-12 DOI:arxiv-2409.07730

T. Aleksandra Ma, Alexander Lerch

{"title":"Music auto-tagging in the long tail: A few-shot approach","authors":"T. Aleksandra Ma, Alexander Lerch","doi":"arxiv-2409.07730","DOIUrl":null,"url":null,"abstract":"In the realm of digital music, using tags to efficiently organize and\nretrieve music from extensive databases is crucial for music catalog owners.\nHuman tagging by experts is labor-intensive but mostly accurate, whereas\nautomatic tagging through supervised learning has approached satisfying\naccuracy but is restricted to a predefined set of training tags. Few-shot\nlearning offers a viable solution to expand beyond this small set of predefined\ntags by enabling models to learn from only a few human-provided examples to\nunderstand tag meanings and subsequently apply these tags autonomously. We\npropose to integrate few-shot learning methodology into multi-label music\nauto-tagging by using features from pre-trained models as inputs to a\nlightweight linear classifier, also known as a linear probe. We investigate\ndifferent popular pre-trained features, as well as different few-shot\nparametrizations with varying numbers of classes and samples per class. Our\nexperiments demonstrate that a simple model with pre-trained features can\nachieve performance close to state-of-the-art models while using significantly\nless training data, such as 20 samples per tag. Additionally, our linear probe\nperforms competitively with leading models when trained on the entire training\ndataset. The results show that this transfer learning-based few-shot approach\ncould effectively address the issue of automatically assigning long-tail tags\nwith only limited labeled data.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the realm of digital music, using tags to efficiently organize and retrieve music from extensive databases is crucial for music catalog owners. Human tagging by experts is labor-intensive but mostly accurate, whereas automatic tagging through supervised learning has approached satisfying accuracy but is restricted to a predefined set of training tags. Few-shot learning offers a viable solution to expand beyond this small set of predefined tags by enabling models to learn from only a few human-provided examples to understand tag meanings and subsequently apply these tags autonomously. We propose to integrate few-shot learning methodology into multi-label music auto-tagging by using features from pre-trained models as inputs to a lightweight linear classifier, also known as a linear probe. We investigate different popular pre-trained features, as well as different few-shot parametrizations with varying numbers of classes and samples per class. Our experiments demonstrate that a simple model with pre-trained features can achieve performance close to state-of-the-art models while using significantly less training data, such as 20 samples per tag. Additionally, our linear probe performs competitively with leading models when trained on the entire training dataset. The results show that this transfer learning-based few-shot approach could effectively address the issue of automatically assigning long-tail tags with only limited labeled data.

查看原文本刊更多论文

长尾中的音乐自动标记：寥寥数语的方法

在数字音乐领域，使用标签从庞大的数据库中有效地组织和检索音乐对音乐目录所有者来说至关重要。由专家进行人工标记虽然耗费大量人力，但基本都很准确，而通过监督学习进行的自动标记已接近令人满意的准确度，但仅限于一组预定义的训练标签。少量标记学习提供了一种可行的解决方案，它能让模型仅从少量人类提供的示例中学习理解标记含义，然后自主应用这些标记，从而超越这一小批预定义标记的范围。我们建议将少量学习方法集成到多标签音乐自动标记中，将来自预训练模型的特征作为轻量级线性分类器（也称为线性探针）的输入。我们研究了不同的流行预训练特征，以及不同的少拍参数三元组，每类的类数和样本数也各不相同。我们的实验证明，使用预训练特征的简单模型可以达到接近最先进模型的性能，而使用的训练数据却少得多，比如每个标签只需 20 个样本。此外，当在整个训练数据集上进行训练时，我们的线性概率模型的表现与领先模型不相上下。研究结果表明，这种基于迁移学习的 "小试牛刀 "方法可以有效地解决仅使用有限标签数据自动分配长尾标签的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - EE - Audio and Speech Processing

自引率

0.00%

发文量