Accelerated Self-Supervised Multi-Illumination Color Constancy with Hybrid Knowledge Distillation.

IF 20.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-06-25 DOI:10.1109/tpami.2025.3583090

Ziyu Feng,Bing Li,Congyan Lang,Zheming Xu,Haina Qin,Juan Wang,Weihua Xiong

{"title":"Accelerated Self-Supervised Multi-Illumination Color Constancy with Hybrid Knowledge Distillation.","authors":"Ziyu Feng,Bing Li,Congyan Lang,Zheming Xu,Haina Qin,Juan Wang,Weihua Xiong","doi":"10.1109/tpami.2025.3583090","DOIUrl":null,"url":null,"abstract":"Color constancy, the human visual system's ability to perceive consistent colors under varying illumination conditions, is crucial for accurate color perception. Recently, deep learning algorithms have been introduced into this task and have achieved remarkable achievements. However, existing methods are limited by the scale of current multi-illumination datasets and model size, hindering their ability to learn discriminative features effectively and their practical value for deployment in cameras. To overcome these limitations, this paper proposes a multi-illumination color constancy approach based on self-supervised learning and knowledge distillation. This approach includes three phases: self-supervised pre-training, supervised fine-tuning, and knowledge distillation. During the pre-training phase, we train Transformer-based and U-Net based encoders by two pretext tasks: light normalization task to learn lighting color contextual representation and grayscale colorization task to acquire objects' inherent color information. For the downstream color constancy task, we fine-tune the encoders and design a lightweight decoder to obtain better illumination distributions with fewer parameters. During the knowledge distillation phase, we introduce a hybrid knowledge distillation technique to align CNN features with those of Transformer and U-Net respectively. Our proposed method outperforms state-of-the-art techniques on multi-illumination and single-illumination benchmarks. Extensive ablation studies and visualizations confirm the effectiveness of our model.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"63 1","pages":""},"PeriodicalIF":20.8000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3583090","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Color constancy, the human visual system's ability to perceive consistent colors under varying illumination conditions, is crucial for accurate color perception. Recently, deep learning algorithms have been introduced into this task and have achieved remarkable achievements. However, existing methods are limited by the scale of current multi-illumination datasets and model size, hindering their ability to learn discriminative features effectively and their practical value for deployment in cameras. To overcome these limitations, this paper proposes a multi-illumination color constancy approach based on self-supervised learning and knowledge distillation. This approach includes three phases: self-supervised pre-training, supervised fine-tuning, and knowledge distillation. During the pre-training phase, we train Transformer-based and U-Net based encoders by two pretext tasks: light normalization task to learn lighting color contextual representation and grayscale colorization task to acquire objects' inherent color information. For the downstream color constancy task, we fine-tune the encoders and design a lightweight decoder to obtain better illumination distributions with fewer parameters. During the knowledge distillation phase, we introduce a hybrid knowledge distillation technique to align CNN features with those of Transformer and U-Net respectively. Our proposed method outperforms state-of-the-art techniques on multi-illumination and single-illumination benchmarks. Extensive ablation studies and visualizations confirm the effectiveness of our model.

查看原文本刊更多论文

基于混合知识蒸馏的加速自监督多光照颜色恒常性。

色彩恒常性，即人类视觉系统在不同光照条件下感知一致颜色的能力，对于准确的色彩感知至关重要。最近，深度学习算法被引入到这个任务中，并取得了显著的成就。然而，现有方法受到当前多光照数据集规模和模型大小的限制，阻碍了它们有效学习判别特征的能力和在相机中部署的实用价值。为了克服这些局限性，本文提出了一种基于自监督学习和知识升华的多照度颜色恒常性方法。该方法包括三个阶段：自我监督预训练、监督微调和知识提炼。在预训练阶段，我们通过两个借口任务来训练基于transformer和基于U-Net的编码器：光归一化任务来学习照明颜色上下文表示和灰度着色任务来获取物体的固有颜色信息。对于下游的颜色恒定任务，我们微调编码器和设计一个轻量级的解码器，以更少的参数获得更好的照明分布。在知识蒸馏阶段，我们引入了一种混合知识蒸馏技术，将CNN的特征分别与Transformer和U-Net的特征对齐。我们提出的方法在多照度和单照度基准测试中优于最先进的技术。广泛的消融研究和可视化证实了我们模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.