Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition

arXiv - CS - Sound Pub Date : 2024-04-26 DOI:arxiv-2404.17252

Houtan Ghaffari, Paul Devos

引用次数: 0

Abstract

Transferring the weights of a pre-trained model to assist another task has become a crucial part of modern deep learning, particularly in data-scarce scenarios. Pre-training refers to the initial step of training models outside the current task of interest, typically on another dataset. It can be done via supervised models using human-annotated datasets or self-supervised models trained on unlabeled datasets. In both cases, many pre-trained models are available to fine-tune for the task of interest. Interestingly, research has shown that pre-trained models from ImageNet can be helpful for audio tasks despite being trained on image datasets. Hence, it's unclear whether in-domain models would be advantageous compared to competent out-domain models, such as convolutional neural networks from ImageNet. Our experiments will demonstrate the usefulness of in-domain models and datasets for bird species recognition by leveraging VICReg, a recent and powerful self-supervised method.

查看原文本刊更多论文

比较自监督域内转移学习和监督域外转移学习在鸟类物种识别中的应用

将预先训练好的模型的权重转移到另一项任务上，已成为现代深度学习的重要组成部分，尤其是在数据稀缺的情况下。预训练指的是在当前任务之外训练模型的初始步骤，通常是在另一个数据集上。预训练可以通过使用人类标注数据集的监督模型或在无标注数据集上的自监督模型来完成。在这两种情况下，都有许多预先训练好的模型，可以针对感兴趣的任务进行微调。有趣的是，研究表明，来自 ImageNet 的预训练模型尽管是在图像数据集上训练的，但对音频任务也有帮助。因此，目前还不清楚域内模型与胜任的域外模型（如 ImageNet 的卷积神经网络）相比是否具有优势。我们的实验将通过利用 VICReg 这一最新的强大自监督方法，证明域内模型和数据集在鸟类物种识别方面的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量