利用一致性来改进测试时间适应性

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-07-15 DOI:10.1016/j.imavis.2025.105650

Dahuin Jung

{"title":"利用一致性来改进测试时间适应性","authors":"Dahuin Jung","doi":"10.1016/j.imavis.2025.105650","DOIUrl":null,"url":null,"abstract":"<div><div>Test-time adaptation (TTA) is crucial for adjusting pre-trained models to new, unseen test data distributions without ground-truth labels, thereby addressing domain shifts commonly encountered in real-world scenarios. The most widely adopted self-training strategies in TTA include either pseudo-labeling or the minimization of prediction entropy. Different from these approaches, some research in natural language processing explored the use of consistency as a self-training objective. However, the performance improvements via consistency maximization have been limited. Based on this finding, we present a novel approach that employs consistency not as a primary self-training objective but as a metric for effective sample weighting and filtering. Our method, Consistency-TTA (CTTA), enhances performance and computational efficiency by implementing a sample weighting method that prioritizes samples demonstrating robustness to perturbations, and a sample filtering method that restricts backward pass to samples that are less prone to error accumulation. Our CTTA, which can be orthogonally combined with various state-of-the-art baselines, demonstrates performance improvements in extended adaptation tasks such as multi-modal TTA for 3D semantic segmentation and video domain adaptation. We evaluated CTTA on various corruption and natural domain shift datasets, consistently demonstrating meaningful performance improvements. Moreover, CTTA proved to be effective in both classification tasks and semantic segmentation benchmarks, such as CarlaTTA, highlighting its versatility across extended TTA applications.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105650"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing consistency for improved test-time adaptation\",\"authors\":\"Dahuin Jung\",\"doi\":\"10.1016/j.imavis.2025.105650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Test-time adaptation (TTA) is crucial for adjusting pre-trained models to new, unseen test data distributions without ground-truth labels, thereby addressing domain shifts commonly encountered in real-world scenarios. The most widely adopted self-training strategies in TTA include either pseudo-labeling or the minimization of prediction entropy. Different from these approaches, some research in natural language processing explored the use of consistency as a self-training objective. However, the performance improvements via consistency maximization have been limited. Based on this finding, we present a novel approach that employs consistency not as a primary self-training objective but as a metric for effective sample weighting and filtering. Our method, Consistency-TTA (CTTA), enhances performance and computational efficiency by implementing a sample weighting method that prioritizes samples demonstrating robustness to perturbations, and a sample filtering method that restricts backward pass to samples that are less prone to error accumulation. Our CTTA, which can be orthogonally combined with various state-of-the-art baselines, demonstrates performance improvements in extended adaptation tasks such as multi-modal TTA for 3D semantic segmentation and video domain adaptation. We evaluated CTTA on various corruption and natural domain shift datasets, consistently demonstrating meaningful performance improvements. Moreover, CTTA proved to be effective in both classification tasks and semantic segmentation benchmarks, such as CarlaTTA, highlighting its versatility across extended TTA applications.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105650\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002380\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002380","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

测试时适应（TTA）对于将预训练的模型调整到新的、未见过的测试数据分布（没有真值标签）是至关重要的，从而解决了在现实场景中经常遇到的领域转移。在TTA中最广泛采用的自训练策略包括伪标记或最小化预测熵。与这些方法不同，自然语言处理的一些研究探索了将一致性作为自我训练目标的使用。然而，通过一致性最大化实现的性能改进是有限的。基于这一发现，我们提出了一种新的方法，将一致性不作为主要的自我训练目标，而是作为有效样本加权和过滤的度量。我们的方法，Consistency-TTA (CTTA)，通过实现一种样本加权方法和一种样本过滤方法来提高性能和计算效率，该方法优先考虑对扰动具有鲁棒性的样本，并限制反向传递到不容易产生错误积累的样本。我们的CTTA可以与各种最先进的基线正交结合，证明了扩展自适应任务的性能改进，例如用于3D语义分割和视频域自适应的多模态TTA。我们在各种损坏和自然域转移数据集上评估了CTTA，一致地展示了有意义的性能改进。此外，CTTA在分类任务和语义分割基准（如CarlaTTA）中都被证明是有效的，突出了其在扩展TTA应用中的多功能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Harnessing consistency for improved test-time adaptation

Test-time adaptation (TTA) is crucial for adjusting pre-trained models to new, unseen test data distributions without ground-truth labels, thereby addressing domain shifts commonly encountered in real-world scenarios. The most widely adopted self-training strategies in TTA include either pseudo-labeling or the minimization of prediction entropy. Different from these approaches, some research in natural language processing explored the use of consistency as a self-training objective. However, the performance improvements via consistency maximization have been limited. Based on this finding, we present a novel approach that employs consistency not as a primary self-training objective but as a metric for effective sample weighting and filtering. Our method, Consistency-TTA (CTTA), enhances performance and computational efficiency by implementing a sample weighting method that prioritizes samples demonstrating robustness to perturbations, and a sample filtering method that restricts backward pass to samples that are less prone to error accumulation. Our CTTA, which can be orthogonally combined with various state-of-the-art baselines, demonstrates performance improvements in extended adaptation tasks such as multi-modal TTA for 3D semantic segmentation and video domain adaptation. We evaluated CTTA on various corruption and natural domain shift datasets, consistently demonstrating meaningful performance improvements. Moreover, CTTA proved to be effective in both classification tasks and semantic segmentation benchmarks, such as CarlaTTA, highlighting its versatility across extended TTA applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.