{"title":"Language-guided Alignment and Distillation for Source-free Domain Adaptation","authors":"Jiawen Peng, Jiaxin Chen, Rong Pan, Andy J. Ma","doi":"10.1016/j.neucom.2025.130501","DOIUrl":null,"url":null,"abstract":"<div><div>Source-free domain adaptation (SFDA) is a practical problem in which a pre-trained source model is adapted to an unlabeled target domain without accessing the labeled source data. Although recent studies have successfully incorporated vision–language models (VLMs) like CLIP into SFDA frameworks, the performance of existing methods may be limited due to their reliance on coarse-grained class prompts, in which fine-grained textual knowledge has not been fully exploited. To overcome this limitation, we develop a novel framework of Language-guided Alignment and Distillation (LAD) by integrating visual features with fine-grained textual descriptions generated by pre-trained captioning models. Our method consists of two innovative designs, i.e., category-aware modality alignment (CMA) and language-guided knowledge distillation (LKD). CMA aligns cross-modal feature representations with a gating function to filter out high-confidence same-class samples from negatives to preserve intra-class similarity. LKD better adapts the vision encoder to the target domain through adaptive modality fusion and dual-level distillation guided by both visual and textual modalities. Extensive experiments on five benchmarks, including <em>both image and video recognition</em>, demonstrate that our method consistently outperforms the state of the arts for SFDA, e.g., +2.1% in Office-Home and +4.3% in UCF-HMDB.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130501"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225011737","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Source-free domain adaptation (SFDA) is a practical problem in which a pre-trained source model is adapted to an unlabeled target domain without accessing the labeled source data. Although recent studies have successfully incorporated vision–language models (VLMs) like CLIP into SFDA frameworks, the performance of existing methods may be limited due to their reliance on coarse-grained class prompts, in which fine-grained textual knowledge has not been fully exploited. To overcome this limitation, we develop a novel framework of Language-guided Alignment and Distillation (LAD) by integrating visual features with fine-grained textual descriptions generated by pre-trained captioning models. Our method consists of two innovative designs, i.e., category-aware modality alignment (CMA) and language-guided knowledge distillation (LKD). CMA aligns cross-modal feature representations with a gating function to filter out high-confidence same-class samples from negatives to preserve intra-class similarity. LKD better adapts the vision encoder to the target domain through adaptive modality fusion and dual-level distillation guided by both visual and textual modalities. Extensive experiments on five benchmarks, including both image and video recognition, demonstrate that our method consistently outperforms the state of the arts for SFDA, e.g., +2.1% in Office-Home and +4.3% in UCF-HMDB.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.