{"title":"长尾中的音乐自动标记:寥寥数语的方法","authors":"T. Aleksandra Ma, Alexander Lerch","doi":"arxiv-2409.07730","DOIUrl":null,"url":null,"abstract":"In the realm of digital music, using tags to efficiently organize and\nretrieve music from extensive databases is crucial for music catalog owners.\nHuman tagging by experts is labor-intensive but mostly accurate, whereas\nautomatic tagging through supervised learning has approached satisfying\naccuracy but is restricted to a predefined set of training tags. Few-shot\nlearning offers a viable solution to expand beyond this small set of predefined\ntags by enabling models to learn from only a few human-provided examples to\nunderstand tag meanings and subsequently apply these tags autonomously. We\npropose to integrate few-shot learning methodology into multi-label music\nauto-tagging by using features from pre-trained models as inputs to a\nlightweight linear classifier, also known as a linear probe. We investigate\ndifferent popular pre-trained features, as well as different few-shot\nparametrizations with varying numbers of classes and samples per class. Our\nexperiments demonstrate that a simple model with pre-trained features can\nachieve performance close to state-of-the-art models while using significantly\nless training data, such as 20 samples per tag. Additionally, our linear probe\nperforms competitively with leading models when trained on the entire training\ndataset. The results show that this transfer learning-based few-shot approach\ncould effectively address the issue of automatically assigning long-tail tags\nwith only limited labeled data.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Music auto-tagging in the long tail: A few-shot approach\",\"authors\":\"T. Aleksandra Ma, Alexander Lerch\",\"doi\":\"arxiv-2409.07730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the realm of digital music, using tags to efficiently organize and\\nretrieve music from extensive databases is crucial for music catalog owners.\\nHuman tagging by experts is labor-intensive but mostly accurate, whereas\\nautomatic tagging through supervised learning has approached satisfying\\naccuracy but is restricted to a predefined set of training tags. Few-shot\\nlearning offers a viable solution to expand beyond this small set of predefined\\ntags by enabling models to learn from only a few human-provided examples to\\nunderstand tag meanings and subsequently apply these tags autonomously. We\\npropose to integrate few-shot learning methodology into multi-label music\\nauto-tagging by using features from pre-trained models as inputs to a\\nlightweight linear classifier, also known as a linear probe. We investigate\\ndifferent popular pre-trained features, as well as different few-shot\\nparametrizations with varying numbers of classes and samples per class. Our\\nexperiments demonstrate that a simple model with pre-trained features can\\nachieve performance close to state-of-the-art models while using significantly\\nless training data, such as 20 samples per tag. Additionally, our linear probe\\nperforms competitively with leading models when trained on the entire training\\ndataset. The results show that this transfer learning-based few-shot approach\\ncould effectively address the issue of automatically assigning long-tail tags\\nwith only limited labeled data.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07730\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Music auto-tagging in the long tail: A few-shot approach
In the realm of digital music, using tags to efficiently organize and
retrieve music from extensive databases is crucial for music catalog owners.
Human tagging by experts is labor-intensive but mostly accurate, whereas
automatic tagging through supervised learning has approached satisfying
accuracy but is restricted to a predefined set of training tags. Few-shot
learning offers a viable solution to expand beyond this small set of predefined
tags by enabling models to learn from only a few human-provided examples to
understand tag meanings and subsequently apply these tags autonomously. We
propose to integrate few-shot learning methodology into multi-label music
auto-tagging by using features from pre-trained models as inputs to a
lightweight linear classifier, also known as a linear probe. We investigate
different popular pre-trained features, as well as different few-shot
parametrizations with varying numbers of classes and samples per class. Our
experiments demonstrate that a simple model with pre-trained features can
achieve performance close to state-of-the-art models while using significantly
less training data, such as 20 samples per tag. Additionally, our linear probe
performs competitively with leading models when trained on the entire training
dataset. The results show that this transfer learning-based few-shot approach
could effectively address the issue of automatically assigning long-tail tags
with only limited labeled data.