SETA：面向领域泛化的语义感知边缘引导令牌增强。

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-10-04 DOI:10.1109/TIP.2024.3470517

Jintao Guo;Lei Qi;Yinghuan Shi;Yang Gao

{"title":"SETA：面向领域泛化的语义感知边缘引导令牌增强。","authors":"Jintao Guo;Lei Qi;Yinghuan Shi;Yang Gao","doi":"10.1109/TIP.2024.3470517","DOIUrl":null,"url":null,"abstract":"Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the Semantic-aware Edge-guided Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art (SOTA) style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at \n<uri>https://github.com/lingeringlight/SETA</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5622-5636"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SETA: Semantic-Aware Edge-Guided Token Augmentation for Domain Generalization\",\"authors\":\"Jintao Guo;Lei Qi;Yinghuan Shi;Yang Gao\",\"doi\":\"10.1109/TIP.2024.3470517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the Semantic-aware Edge-guided Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art (SOTA) style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at \\n<uri>https://github.com/lingeringlight/SETA</uri>\\n.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5622-5636\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10705912/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10705912/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

领域泛化（DG）的目的是在不进入目标领域的情况下增强模型对领域变化的鲁棒性。数据增强是领域泛化的一类常用方法，其重点是生成虚拟样本来模拟领域变化。然而，DG 中现有的增强技术主要针对卷积神经网络（CNN），对基于标记的架构（即视觉转换器（ViT）和多层感知器（MLP）模型）的探索有限。在本文中，我们研究了之前基于 CNN 的增强方法对基于标记的模型的影响，发现由于缺乏激励模型学习整体形状信息，它们的性能并不理想。为了解决这个问题，我们提出了语义感知边缘引导标记增强（SETA）方法。SETA 通过扰动局部边缘线索来转换标记特征，同时保留整体形状特征，从而增强模型对形状信息的学习。为了进一步增强模型的泛化能力，我们在 DG 中引入了我们的方法的两个风格化变体，并结合了两种最先进的（SOTA）风格增强方法。我们从理论上深入分析了我们的方法，证明了它在降低泛化风险边界方面的有效性。在五个基准上进行的综合实验证明，我们的方法在各种 ViT 和 MLP 架构上都达到了 SOTA 性能。我们的代码见 https://github.com/lingeringlight/SETA。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SETA: Semantic-Aware Edge-Guided Token Augmentation for Domain Generalization

Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the Semantic-aware Edge-guided Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art (SOTA) style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at https://github.com/lingeringlight/SETA .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量