{"title":"比较自监督域内转移学习和监督域外转移学习在鸟类物种识别中的应用","authors":"Houtan Ghaffari, Paul Devos","doi":"arxiv-2404.17252","DOIUrl":null,"url":null,"abstract":"Transferring the weights of a pre-trained model to assist another task has\nbecome a crucial part of modern deep learning, particularly in data-scarce\nscenarios. Pre-training refers to the initial step of training models outside\nthe current task of interest, typically on another dataset. It can be done via\nsupervised models using human-annotated datasets or self-supervised models\ntrained on unlabeled datasets. In both cases, many pre-trained models are\navailable to fine-tune for the task of interest. Interestingly, research has\nshown that pre-trained models from ImageNet can be helpful for audio tasks\ndespite being trained on image datasets. Hence, it's unclear whether in-domain\nmodels would be advantageous compared to competent out-domain models, such as\nconvolutional neural networks from ImageNet. Our experiments will demonstrate\nthe usefulness of in-domain models and datasets for bird species recognition by\nleveraging VICReg, a recent and powerful self-supervised method.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition\",\"authors\":\"Houtan Ghaffari, Paul Devos\",\"doi\":\"arxiv-2404.17252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transferring the weights of a pre-trained model to assist another task has\\nbecome a crucial part of modern deep learning, particularly in data-scarce\\nscenarios. Pre-training refers to the initial step of training models outside\\nthe current task of interest, typically on another dataset. It can be done via\\nsupervised models using human-annotated datasets or self-supervised models\\ntrained on unlabeled datasets. In both cases, many pre-trained models are\\navailable to fine-tune for the task of interest. Interestingly, research has\\nshown that pre-trained models from ImageNet can be helpful for audio tasks\\ndespite being trained on image datasets. Hence, it's unclear whether in-domain\\nmodels would be advantageous compared to competent out-domain models, such as\\nconvolutional neural networks from ImageNet. Our experiments will demonstrate\\nthe usefulness of in-domain models and datasets for bird species recognition by\\nleveraging VICReg, a recent and powerful self-supervised method.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2404.17252\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.17252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition
Transferring the weights of a pre-trained model to assist another task has
become a crucial part of modern deep learning, particularly in data-scarce
scenarios. Pre-training refers to the initial step of training models outside
the current task of interest, typically on another dataset. It can be done via
supervised models using human-annotated datasets or self-supervised models
trained on unlabeled datasets. In both cases, many pre-trained models are
available to fine-tune for the task of interest. Interestingly, research has
shown that pre-trained models from ImageNet can be helpful for audio tasks
despite being trained on image datasets. Hence, it's unclear whether in-domain
models would be advantageous compared to competent out-domain models, such as
convolutional neural networks from ImageNet. Our experiments will demonstrate
the usefulness of in-domain models and datasets for bird species recognition by
leveraging VICReg, a recent and powerful self-supervised method.