Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence Pub Date : 2023-06-26 DOI:10.1609/aaai.v37i9.26355

Lin Zhu, Xinbing Wang, Cheng Zhou, Nanyang Ye

{"title":"Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization","authors":"Lin Zhu, Xinbing Wang, Cheng Zhou, Nanyang Ye","doi":"10.1609/aaai.v37i9.26355","DOIUrl":null,"url":null,"abstract":"Recent advances in large pre-trained models showed promising results in few-shot learning. However, their generalization ability on two-dimensional Out-of-Distribution (OoD) data, i.e., correlation shift and diversity shift, has not been thoroughly investigated. Researches have shown that even with a significant amount of training data, few methods can achieve better performance than the standard empirical risk minimization method (ERM) in OoD generalization. This few-shot OoD generalization dilemma emerges as a challenging direction in deep neural network generalization research, where the performance suffers from overfitting on few-shot examples and OoD generalization errors. In this paper, leveraging a broader supervision source, we explore a novel Bayesian cross-modal image-text alignment learning method (Bayes-CAL) to address this issue. Specifically, the model is designed as only text representations are fine-tuned via a Bayesian modelling approach with gradient orthogonalization loss and invariant risk minimization (IRM) loss. The Bayesian approach is essentially introduced to avoid overfitting the base classes observed during training and improve generalization to broader unseen classes. The dedicated loss is introduced to achieve better image-text alignment by disentangling the causal and non-casual parts of image features. Numerical experiments demonstrate that Bayes-CAL achieved state-of-the-art OoD generalization performances on two-dimensional distribution shifts. Moreover, compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes. Our code is available at https://github.com/LinLLLL/BayesCAL.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"26 1","pages":"11461-11469"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v37i9.26355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in large pre-trained models showed promising results in few-shot learning. However, their generalization ability on two-dimensional Out-of-Distribution (OoD) data, i.e., correlation shift and diversity shift, has not been thoroughly investigated. Researches have shown that even with a significant amount of training data, few methods can achieve better performance than the standard empirical risk minimization method (ERM) in OoD generalization. This few-shot OoD generalization dilemma emerges as a challenging direction in deep neural network generalization research, where the performance suffers from overfitting on few-shot examples and OoD generalization errors. In this paper, leveraging a broader supervision source, we explore a novel Bayesian cross-modal image-text alignment learning method (Bayes-CAL) to address this issue. Specifically, the model is designed as only text representations are fine-tuned via a Bayesian modelling approach with gradient orthogonalization loss and invariant risk minimization (IRM) loss. The Bayesian approach is essentially introduced to avoid overfitting the base classes observed during training and improve generalization to broader unseen classes. The dedicated loss is introduced to achieve better image-text alignment by disentangling the causal and non-casual parts of image features. Numerical experiments demonstrate that Bayes-CAL achieved state-of-the-art OoD generalization performances on two-dimensional distribution shifts. Moreover, compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes. Our code is available at https://github.com/LinLLLL/BayesCAL.

查看原文本刊更多论文

基于贝叶斯跨模态学习的少镜头分布外泛化

大型预训练模型的最新进展显示，在少量学习中取得了令人鼓舞的结果。然而，它们对二维离分布(out - distribution, OoD)数据的泛化能力，即相关性偏移和多样性偏移，尚未得到深入的研究。研究表明，即使有大量的训练数据，在OoD泛化中也很少有方法能比标准的经验风险最小化方法(ERM)取得更好的性能。在深度神经网络泛化研究中，小样本OoD泛化困境是一个具有挑战性的研究方向，其性能受到小样本过拟合和OoD泛化误差的影响。在本文中，利用更广泛的监督源，我们探索了一种新的贝叶斯跨模态图像-文本对齐学习方法(Bayes-CAL)来解决这个问题。具体来说，该模型被设计为只有文本表示通过具有梯度正交化损失和不变风险最小化(IRM)损失的贝叶斯建模方法进行微调。引入贝叶斯方法本质上是为了避免在训练期间观察到的基类过拟合，并提高对更广泛的未见类的泛化。通过分离图像特征的因果部分和非因果部分，引入专用损失来实现更好的图像-文本对齐。数值实验表明，Bayes-CAL在二维分布位移上取得了较好的泛化性能。此外，与类clip模型相比，Bayes-CAL在未见过的类上具有更稳定的泛化性能。我们的代码可在https://github.com/LinLLLL/BayesCAL上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

自引率

0.00%

发文量