Test-time adaptation for object detection via Dynamic Dual Teaching

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-09-23 DOI:10.1016/j.imavis.2025.105740

Siqi Zhang , Lu Zhang , Zhiyong Liu

{"title":"Test-time adaptation for object detection via Dynamic Dual Teaching","authors":"Siqi Zhang , Lu Zhang , Zhiyong Liu","doi":"10.1016/j.imavis.2025.105740","DOIUrl":null,"url":null,"abstract":"<div><div>Test-Time Adaptation (TTA) is a practical setting in real-world applications, which aims to adapt a source-trained model to target domains with online unlabeled test data streams. Current approaches often rely on self-training, utilizing supervision signals from the source-trained model, suffering from poor adaptation due to diverse domain shifts. In this paper, we propose a novel test-time adaptation method for object detection guided by dual teachers, termed <strong>D</strong>ynamic <strong>D</strong>ual <strong>T</strong>eaching (<strong>DDT</strong>). Inspired by the generalization potentials of Vision-Language Models (VLMs), we integrate the VLM as an additional language-driven instructor. This integration exploits the domain-robustness of language prompts to mitigate domain shifts, collaborating with the teacher of source information within the teacher–student framework. Firstly, we utilize an ensemble prompt to guide the prediction process of the language-driven instructor. Secondly, a dynamic fusion strategy of the dual teachers is designed to generate high-quality pseudo-labels for student learning. Moreover, we incorporate a dual prediction consistency regularization to further mitigate the sensitivity of the adapted detector to domain shifts. Experiments on diverse domain adaptation benchmarks demonstrate that the proposed DDT method achieves state-of-the-art performance on both online and offline domain adaptation settings.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105740"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003282","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Test-Time Adaptation (TTA) is a practical setting in real-world applications, which aims to adapt a source-trained model to target domains with online unlabeled test data streams. Current approaches often rely on self-training, utilizing supervision signals from the source-trained model, suffering from poor adaptation due to diverse domain shifts. In this paper, we propose a novel test-time adaptation method for object detection guided by dual teachers, termed Dynamic Dual Teaching (DDT). Inspired by the generalization potentials of Vision-Language Models (VLMs), we integrate the VLM as an additional language-driven instructor. This integration exploits the domain-robustness of language prompts to mitigate domain shifts, collaborating with the teacher of source information within the teacher–student framework. Firstly, we utilize an ensemble prompt to guide the prediction process of the language-driven instructor. Secondly, a dynamic fusion strategy of the dual teachers is designed to generate high-quality pseudo-labels for student learning. Moreover, we incorporate a dual prediction consistency regularization to further mitigate the sensitivity of the adapted detector to domain shifts. Experiments on diverse domain adaptation benchmarks demonstrate that the proposed DDT method achieves state-of-the-art performance on both online and offline domain adaptation settings.

查看原文本刊更多论文

动态对偶教学中目标检测的测试时间适应

测试时间适应（TTA）是现实世界应用程序中的一个实际设置，其目的是使源训练的模型适应具有在线未标记测试数据流的目标域。目前的方法通常依赖于自我训练，利用来自源训练模型的监督信号，由于不同的领域变化，适应性差。在本文中，我们提出了一种新的由双教师引导的测试时间自适应的目标检测方法，称为动态双教学（DDT）。受视觉语言模型（VLM）泛化潜力的启发，我们将VLM集成为一个额外的语言驱动讲师。这种集成利用语言提示的领域鲁棒性来减轻领域转移，在师生框架内与源信息的教师协作。首先，我们利用集合提示来指导语言驱动讲师的预测过程。其次，设计双师动态融合策略，为学生学习生成高质量的伪标签。此外，我们还结合了对偶预测一致性正则化，以进一步降低自适应检测器对域移位的敏感性。在不同领域自适应基准上的实验表明，所提出的滴滴涕方法在在线和离线领域自适应设置上都达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.