LC-Protonets：世界音乐音频标注的多标签少镜头学习

IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-01-13 DOI:10.1109/OJSP.2025.3529315

Charilaos Papaioannou;Emmanouil Benetos;Alexandros Potamianos

{"title":"LC-Protonets：世界音乐音频标注的多标签少镜头学习","authors":"Charilaos Papaioannou;Emmanouil Benetos;Alexandros Potamianos","doi":"10.1109/OJSP.2025.3529315","DOIUrl":null,"url":null,"abstract":"We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"138-146"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839319","citationCount":"0","resultStr":"{\"title\":\"LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging\",\"authors\":\"Charilaos Papaioannou;Emmanouil Benetos;Alexandros Potamianos\",\"doi\":\"10.1109/OJSP.2025.3529315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.\",\"PeriodicalId\":73300,\"journal\":{\"name\":\"IEEE open journal of signal processing\",\"volume\":\"6 \",\"pages\":\"138-146\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839319\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10839319/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10839319/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

我们引入标签组合原型网络（LC-Protonets）来解决多标签少射分类问题，其中模型必须仅基于少数可用示例泛化到新类。扩展原型网络，LC-Protonets从有限训练项目中存在的标签的功率集中生成每个标签组合的一个原型，而不是每个标签一个原型。我们的方法被应用于不同音乐数据集的自动音频标记，涵盖了不同的文化，包括现代和传统音乐，并根据文献中的现有方法进行了评估。结果表明，当使用LC-Protonets进行多标签分类时，几乎在所有领域和训练设置中都有显着的性能改进。除了从头开始训练几个镜头学习模型外，我们还探索了使用通过监督学习获得的预训练模型将项目嵌入到特征空间中。微调提高了所有方法的泛化能力，但LC-Protonets即使没有微调也能达到高水平的性能，与比较方法相比。最后，我们分析了所提出方法的可扩展性，并从实验中提供了详细的定量指标。实现和实验设置是公开的，为未来的研究提供了一个基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging

We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE open journal of signal processing

CiteScore

5.30

自引率

0.00%

发文量

审稿时长

22 weeks