MetaAudio: A Few-Shot Audio Classification Benchmark

Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society) Pub Date : 2022-04-05 DOI:10.48550/arXiv.2204.02121

Calum Heggan, S. Budgett, Timothy M. Hospedales, Mehrdad Yaghoobi

{"title":"MetaAudio: A Few-Shot Audio Classification Benchmark","authors":"Calum Heggan, S. Budgett, Timothy M. Hospedales, Mehrdad Yaghoobi","doi":"10.48550/arXiv.2204.02121","DOIUrl":null,"url":null,"abstract":"Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"41 1","pages":"219-230"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.02121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.

查看原文本刊更多论文

MetaAudio:少量音频分类基准

目前可用的few-shot学习(具有少量训练样例的机器学习)基准在其涵盖的领域中是有限的，主要集中在图像分类上。这项工作旨在通过提供第一个全面、公开和完全可复制的基于音频的替代方案，覆盖各种声音域和实验设置，减轻对基于图像的基准的依赖。我们比较了各种技术在七个音频数据集(从环境声音到人类语音)上的少镜头分类性能。在此基础上，我们对联合训练(在训练期间使用所有数据集)和跨数据集适应协议进行了深入分析，建立了通用音频少镜头分类算法的可能性。我们的实验表明，基于梯度的元学习方法(如MAML和meta-曲率)始终优于度量和基线方法。我们还证明，联合训练例程有助于所包括的环境声音数据库的总体泛化，同时也是处理跨数据集/域设置的有效方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)

自引率

0.00%

发文量