Semantic Cross Attention for Few-shot Learning

Asian Conference on Machine Learning Pub Date : 2022-10-12 DOI:10.48550/arXiv.2210.06311

Bin Xiao, Chien Liu, W. Hsaio

{"title":"Semantic Cross Attention for Few-shot Learning","authors":"Bin Xiao, Chien Liu, W. Hsaio","doi":"10.48550/arXiv.2210.06311","DOIUrl":null,"url":null,"abstract":"Few-shot learning (FSL) has attracted considerable attention recently. Among existing approaches, the metric-based method aims to train an embedding network that can make similar samples close while dissimilar samples as far as possible and achieves promising results. FSL is characterized by using only a few images to train a model that can generalize to novel classes in image classification problems, but this setting makes it difficult to learn the visual features that can identify the images' appearance variations. The model training is likely to move in the wrong direction, as the images in an identical semantic class may have dissimilar appearances, whereas the images in different semantic classes may share a similar appearance. We argue that FSL can benefit from additional semantic features to learn discriminative feature representations. Thus, this study proposes a multi-task learning approach to view semantic features of label text as an auxiliary task to help boost the performance of the FSL task. Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal. The proposed approach is simple, but produces excellent results. We apply our proposed approach to two previous metric-based FSL methods, all of which can substantially improve performance. The source code for our model is accessible from github.","PeriodicalId":119756,"journal":{"name":"Asian Conference on Machine Learning","volume":"560 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.06311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Few-shot learning (FSL) has attracted considerable attention recently. Among existing approaches, the metric-based method aims to train an embedding network that can make similar samples close while dissimilar samples as far as possible and achieves promising results. FSL is characterized by using only a few images to train a model that can generalize to novel classes in image classification problems, but this setting makes it difficult to learn the visual features that can identify the images' appearance variations. The model training is likely to move in the wrong direction, as the images in an identical semantic class may have dissimilar appearances, whereas the images in different semantic classes may share a similar appearance. We argue that FSL can benefit from additional semantic features to learn discriminative feature representations. Thus, this study proposes a multi-task learning approach to view semantic features of label text as an auxiliary task to help boost the performance of the FSL task. Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal. The proposed approach is simple, but produces excellent results. We apply our proposed approach to two previous metric-based FSL methods, all of which can substantially improve performance. The source code for our model is accessible from github.

查看原文本刊更多论文

语义交叉注意在短时学习中的应用

近年来，FSL (Few-shot learning)受到了广泛的关注。在现有的方法中，基于度量的方法旨在训练一个使相似样本尽可能接近而不相似样本尽可能接近的嵌入网络，并取得了很好的效果。FSL的特点是仅使用少量图像来训练模型，该模型可以推广到图像分类问题中的新类别，但这种设置使得难以学习可以识别图像外观变化的视觉特征。模型训练很可能会朝着错误的方向发展，因为相同语义类中的图像可能具有不同的外观，而不同语义类中的图像可能具有相似的外观。我们认为FSL可以受益于额外的语义特征来学习判别特征表示。因此，本研究提出了一种多任务学习方法，将标签文本的语义特征作为辅助任务来看待，以帮助提高FSL任务的性能。我们提出的模型使用词嵌入表示作为语义特征来帮助训练嵌入网络，并使用语义交叉注意模块将语义特征桥接到典型的视觉模态中。所提出的方法很简单，但是产生了很好的结果。我们将我们提出的方法应用于之前的两种基于度量的FSL方法，所有这些方法都可以大大提高性能。我们的模型的源代码可以从github访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Asian Conference on Machine Learning

自引率

0.00%

发文量