IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Text-Video Retrieval With Global-LocalSemantic Consistent Learning 基于全局-局部语义一致学习的文本-视频检索
Haonan Zhang;Pengpeng Zeng;Lianli Gao;Jingkuan Song;Yihang Duan;Xinyu Lyu;Heng Tao Shen
{"title":"Text-Video Retrieval With Global-LocalSemantic Consistent Learning","authors":"Haonan Zhang;Pengpeng Zeng;Lianli Gao;Jingkuan Song;Yihang Duan;Xinyu Lyu;Heng Tao Shen","doi":"10.1109/TIP.2025.3574925","DOIUrl":"10.1109/TIP.2025.3574925","url":null,"abstract":"Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Furthermore, an Inter-Consistency Loss (ICL) is devised to accomplish the concept alignment between the visual query and corresponding textual query, and an Intra-Diversity Loss (IDL) is developed to repulse the distribution within visual (textual) queries to generate more discriminative concepts. Extensive experiments on five widely used benchmarks (i.e., MSR-VTT, MSVD, DiDeMo, LSMDC, and ActivityNet) substantiate the superior effectiveness and efficiency of the proposed method. Remarkably, our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost. Code is available at: <monospace><uri>https://github.com/zchoi/GLSCL</uri></monospace>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3463-3474"},"PeriodicalIF":0.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144218890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCT-CCDiff: Context-Aware Contrastive Diffusion Model With Mediator-Bridging Cross-Modal Transformer for Image Change Captioning MCT-CCDiff:带有中介体桥接跨模态转换器的图像变化字幕上下文感知对比扩散模型
Jinhong Hu;Guojin Zhong;Jin Yuan;Wenbo Pan;Xiaoping Wang
{"title":"MCT-CCDiff: Context-Aware Contrastive Diffusion Model With Mediator-Bridging Cross-Modal Transformer for Image Change Captioning","authors":"Jinhong Hu;Guojin Zhong;Jin Yuan;Wenbo Pan;Xiaoping Wang","doi":"10.1109/TIP.2025.3573471","DOIUrl":"10.1109/TIP.2025.3573471","url":null,"abstract":"Recent advancements in diffusion models (DMs) have showcased superior capabilities in generating images and text. This paper first introduces DMs for image change captioning (ICC) and proposes a novel Context-aware Contrastive Diffusion model with Mediator-bridging Cross-modal Transformer (MCT-CCDiff) to accurately predict visual difference descriptions conditioned on two similar images. Technically, MCT-CCDiff develops a Text Embedding Contrastive Loss (TECL) that leverages both positive and negative samples to more effectively distinguish text embeddings, thus generating more discriminative text representations for ICC. To accurately predict visual difference descriptions, MCT-CCDiff introduces a Mediator-bridging Cross-modal Transformer (MCTrans) designed to efficiently explore the cross-modal correlations between visual differences and corresponding text by using a lightweight mediator, mitigating interference from visual redundancy and reducing interaction overhead. Additionally, it incorporates context-augmented denoising to further understand the contextual relationships within caption words implemented by a revised diffusion loss, which provides a tighter optimization bound, leading to enhanced optimization effects for high-quality text generation. Extensive experiments conducted on four benchmark datasets clearly demonstrate that our MCT-CCDiff significantly outperforms state-of-the-art methods in the field of ICC, marking a notable advancement in the generation of precise visual difference descriptions.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3294-3308"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Resolution Natural Image Matting by Refining Low-Resolution Alpha Mattes 高分辨率自然图像抠图通过细化低分辨率Alpha抠图
Xianmin Ye;Yihui Liang;Mian Tan;Fujian Feng;Lin Wang;Han Huang
{"title":"High-Resolution Natural Image Matting by Refining Low-Resolution Alpha Mattes","authors":"Xianmin Ye;Yihui Liang;Mian Tan;Fujian Feng;Lin Wang;Han Huang","doi":"10.1109/TIP.2025.3573620","DOIUrl":"10.1109/TIP.2025.3573620","url":null,"abstract":"High-resolution natural image matting plays an important role in image editing, film-making and remote sensing due to its ability of accurately extract the foreground from a natural background. However, due to the complexity brought about by the proliferation of resolution, the existing image matting methods cannot obtain high-quality alpha mattes on high-resolution images in reasonable time. To overcome this challenge, we introduce a high-resolution image matting framework based on alpha matte refinement from low-resolution to high-resolution (HRIMF-AMR). The proposed framework transforms the complex high-resolution image matting problem into low-resolution image matting problem and high-resolution alpha matte refinement problem. While the first problem is solved by adopting an existing image matting method, the latter is addressed by applying the Detail Difference Feature Extractor (DDFE) designed as a part of our work. The DDFE extracts detail difference features from high-resolution images by measuring the image feature difference between high-resolution images and low-resolution images. The low-resolution alpha matte is refined according to the extracted detail difference feature, providing the high-resolution alpha matte. In addition, the Matte Detail Resolution Difference (MDRD) loss is introduced to train the DDFE, which imposes an additional constraint on the extraction of detail difference features with mattes. Experimental results show that integrating HRIMF-AMR significantly enhances the performance of existing matting methods on high-resolution images of Transparent-460 and Alphamatting. Project page: <uri>https://github.com/yexianmin/HRAMR-Matting</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3323-3335"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototypical Distribution Divergence Loss for Image Restoration 用于图像恢复的原型分布散度损失
Jialun Peng;Jingjing Fu;Dong Liu
{"title":"Prototypical Distribution Divergence Loss for Image Restoration","authors":"Jialun Peng;Jingjing Fu;Dong Liu","doi":"10.1109/TIP.2025.3572818","DOIUrl":"10.1109/TIP.2025.3572818","url":null,"abstract":"Neural networks have achieved significant advances in the field of image restoration and much research has focused on designing new architectures for convolutional neural networks (CNNs) and Transformers. The choice of loss functions, despite being a critical factor when training image restoration networks, has attracted little attention. The existing losses are primarily based on semantic or hand-crafted representations. Recently, discrete representations have demonstrated strong capabilities in representing images. In this work, we explore the loss of discrete representations for image restoration. Specifically, we propose a Local Residual Quantized Variational AutoEncoder (Local RQ-VAE) to learn prototype vectors that represent the local details of high-quality images. Then we propose a Prototypical Distribution Divergence (PDD) loss that measures the Kullback-Leibler divergence between the prototypical distributions of the restored and target images. Experimental results demonstrate that our PDD loss improves the restored images in both PSNR and visual quality for state-of-the-art CNNs and Transformers on several image restoration tasks, including image super-resolution, image denoising, image motion deblurring, and defocus deblurring.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3563-3577"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PLGS: Robust Panoptic Lifting With 3D Gaussian Splatting PLGS:鲁棒全景提升与3D高斯飞溅
Yu Wang;Xiaobao Wei;Ming Lu;Guoliang Kang
{"title":"PLGS: Robust Panoptic Lifting With 3D Gaussian Splatting","authors":"Yu Wang;Xiaobao Wei;Ming Lu;Guoliang Kang","doi":"10.1109/TIP.2025.3573524","DOIUrl":"10.1109/TIP.2025.3573524","url":null,"abstract":"Previous methods utilize the Neural Radiance Field (NeRF) for panoptic lifting, while their training and rendering speed are unsatisfactory. In contrast, 3D Gaussian Splatting (3DGS) has emerged as a prominent technique due to its rapid training and rendering speed. However, unlike NeRF, the conventional 3DGS may not satisfy the basic smoothness assumption as it does not rely on any parameterized structures to render (e.g., MLPs). Consequently, the conventional 3DGS is, in nature, more susceptible to noisy 2D mask supervision. In this paper, we propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks while maintaining superior efficiency compared to NeRF-based methods. Specifically, we build a panoptic-aware structured 3D Gaussian model to introduce smoothness and design effective noise reduction strategies. For the semantic field, instead of initialization with structure from motion, we construct reliable semantic anchor points to initialize the 3D Gaussians. We then use these anchor points as smooth regularization during training. Additionally, we present a self-training approach using pseudo labels generated by merging the rendered masks with the noisy masks to enhance the robustness of PLGS. For the instance field, we project the 2D instance masks into 3D space and match them with oriented bounding boxes to generate cross-view consistent instance masks for supervision. Experiments on various benchmarks demonstrate that our method outperforms previous state-of-the-art methods in terms of both segmentation quality and speed.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3377-3388"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Experts and Hierarchical Perception for Underwater Salient Object Detection 基于异构专家和层次感知的水下显著目标检测
Mingfeng Zha;Guoqing Wang;Yunqiang Pei;Tianyu Li;Xiongxin Tang;Chongyi Li;Yang Yang;Heng Tao Shen
{"title":"Heterogeneous Experts and Hierarchical Perception for Underwater Salient Object Detection","authors":"Mingfeng Zha;Guoqing Wang;Yunqiang Pei;Tianyu Li;Xiongxin Tang;Chongyi Li;Yang Yang;Heng Tao Shen","doi":"10.1109/TIP.2025.3572760","DOIUrl":"10.1109/TIP.2025.3572760","url":null,"abstract":"Existing underwater salient object detection (USOD) methods design fusion strategies to integrate multimodal information, but lack exploration of modal characteristics. To address this, we separately leverage the RGB and depth branches to learn disentangled representations, formulating the heterogeneous experts and hierarchical perception network (HEHP). Specifically, to reduce modal discrepancies, we propose the hierarchical prototype guided interaction (HPI), which achieves fine-grained alignment guided by the semantic prototypes, and then refines with complementary modalities. We further design the mixture of frequency experts (MoFE), where experts focus on modeling high- and low-frequency respectively, collaborating to explicitly obtain hierarchical representations. To efficiently integrate diverse spatial and frequency information, we formulate the four-way fusion experts (FFE), which dynamically selects optimal experts for fusion while being sensitive to scale and orientation. Since depth maps with poor quality inevitably introduce noises, we design the uncertainty injection (UI) to explore high uncertainty regions by establishing pixel-level probability distributions. We further formulate the holistic prototype contrastive (HPC) loss based on semantics and patches to learn compact and general representations across modalities and images. Finally, we employ varying supervision based on branch distinctions to implicitly construct difference modeling. Extensive experiments on two USOD datasets and four relevant underwater scene benchmarks validate the effect of the proposed method, surpassing state-of-the-art binary detection models. Impressive results on seven natural scene benchmarks further demonstrate the scalability.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3703-3717"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Environmental Robustness in Few-Shot Learning via Conditional Representation Learning 利用条件表示学习增强少镜头学习的环境鲁棒性
Qianyu Guo;Jingrong Wu;Tianxing Wu;Haofen Wang;Weifeng Ge;Wenqiang Zhang
{"title":"Enhancing Environmental Robustness in Few-Shot Learning via Conditional Representation Learning","authors":"Qianyu Guo;Jingrong Wu;Tianxing Wu;Haofen Wang;Weifeng Ge;Wenqiang Zhang","doi":"10.1109/TIP.2025.3572762","DOIUrl":"10.1109/TIP.2025.3572762","url":null,"abstract":"Few-shot learning (FSL) has recently been extensively utilized to overcome the scarcity of training data in domain-specific visual recognition. In real-world scenarios, environmental factors such as complex backgrounds, varying lighting conditions, long-distance shooting, and moving targets often cause test images to exhibit numerous incomplete targets or noise disruptions. However, current research on evaluation datasets and methodologies has largely ignored the concept of “environmental robustness”, which refers to maintaining consistent performance in complex and diverse physical environments. This neglect has led to a notable decline in the performance of FSL models during practical testing compared to their training performance. To bridge this gap, we introduce a new real-world multi-domain few-shot learning (RD-FSL) benchmark, which includes four domains and six evaluation datasets. The test images in this benchmark feature various challenging elements, such as camouflaged objects, small targets, and blurriness. Our evaluation experiments reveal that existing methods struggle to utilize training images effectively to generate accurate feature representations for challenging test images. To address this problem, we propose a novel conditional representation learning network (CRLNet) that integrates the interactions between training and testing images as conditional information in their respective representation processes. The main goal is to reduce intra-class variance or enhance inter-class variance at the feature representation level. Finally, comparative experiments reveal that CRLNet surpasses the current state-of-the-art methods, achieving performance improvements ranging from 6.83% to 16.98% across diverse settings and backbones. The source code and dataset are available at <uri>https://github.com/guoqianyu-alberta/Conditional-Representation-Learning</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3489-3502"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Multi-View Contrastive Clustering via Graph Structure Awareness 基于图结构感知的深度多视图对比聚类
Lunke Fei;Junlin He;Qi Zhu;Shuping Zhao;Jie Wen;Yong Xu
{"title":"Deep Multi-View Contrastive Clustering via Graph Structure Awareness","authors":"Lunke Fei;Junlin He;Qi Zhu;Shuping Zhao;Jie Wen;Yong Xu","doi":"10.1109/TIP.2025.3573501","DOIUrl":"10.1109/TIP.2025.3573501","url":null,"abstract":"Multi-view clustering (MVC) aims to exploit the latent relationships between heterogeneous samples in an unsupervised manner, which has served as a fundamental task in the unsupervised learning community and has drawn widespread attention. In this work, we propose a new deep multi-view contrastive clustering method via graph structure awareness (DMvCGSA) by conducting both instance-level and cluster-level contrastive learning to exploit the collaborative representations of multi-view samples. Unlike most existing deep multi-view clustering methods, which usually extract only the attribute features for multi-view representation, we first exploit the view-specific features while preserving the latent structural information between multi-view data via a GCN-embedded autoencoder, and further develop a similarity-guided instance-level contrastive learning scheme to make the view-specific features discriminative. Moreover, unlike existing methods that separately explore common information, which may not contribute to the clustering task, we employ cluster-level contrastive learning to explore the clustering-beneficial consistency information directly, resulting in improved and reliable performance for the final multi-view clustering task. Extensive experimental results on twelve benchmark datasets clearly demonstrate the encouraging effectiveness of the proposed method compared with the state-of-the-art models.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3805-3816"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detector With Classifier2: An End-to-End Multi-Stream Feature Aggregation Network for Fine-Grained Object Detection in Remote Sensing Images 基于Classifier2的探测器:面向遥感图像细粒度目标检测的端到端多流特征聚合网络
Shangdong Zheng;Zebin Wu;Yang Xu;Chengxun He;Zhihui Wei
{"title":"Detector With Classifier2: An End-to-End Multi-Stream Feature Aggregation Network for Fine-Grained Object Detection in Remote Sensing Images","authors":"Shangdong Zheng;Zebin Wu;Yang Xu;Chengxun He;Zhihui Wei","doi":"10.1109/TIP.2025.3563708","DOIUrl":"10.1109/TIP.2025.3563708","url":null,"abstract":"Fine-grained object detection (FGOD) fundamentally comprises two primary tasks: object detection and fine-grained classification. In natural scenes, most FGOD methods benefit from higher instance resolution and fewer environmental variation, attributing more commonly associated with the latter task. In this paper, we propose a unified paradigm named Detector with Classifier2 (DC2), which provides a holistic paradigm by explicitly considering the end-to-end integration of object detection and fine-grained classification tasks, rather than prioritizing one aspect. Initially, our detection sub-network is restricted to only determining whether the proposal is a coarse-category and does not delve into the specific sub-categories. Moreover, in order to reduce redundant pixel-level calculation, we propose an instance-level feature enhancement (IFE) module to model the semantic similarities among proposals, which poses great potential for locating more instances in remote sensing images (RSIs). After obtaining the coarse detection predictions, we further construct a classification sub-network, which is built on top of the former branch to determine the specific sub-categories of the aforementioned predictions. Importantly, the detection network is performed on the complete image, while the classification network conducts secondary modeling for the detected regions. These operations can be denoted as the global contextual information and local intrinsic cues extractions for each instance. Therefore, we propose a multi-stream feature aggregation (MSFA) module to integrate global-stream semantic information and local-stream discriminative cues. Our whole DC2 network follows an end-to-end learning fashion, which effectively excavates the internal correlation between detection and fine-grained classification networks. We evaluate the performance of our DC2 network on two benchmarks SAT-MTB and HRSC2016 datasets. Importantly, our method achieves the new state-of-the-art results compared with recent works (approximately 7% mAP gains on SAT-MTB) and improves baseline by a significant margin (43.2% <inline-formula> <tex-math>$v.s.~36.7$ </tex-math></inline-formula>%) without any complicated post-processing strategies. Source codes of the proposed methods are available at <uri>https://github.com/zhengshangdong/DC2</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2707-2720"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143893504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to See Low-Light Images via Feature Domain Adaptation 学习通过特征域适应看到低光图像
Qirui Yang;Qihua Cheng;Huanjing Yue;Le Zhang;Yihao Liu;Jingyu Yang
{"title":"Learning to See Low-Light Images via Feature Domain Adaptation","authors":"Qirui Yang;Qihua Cheng;Huanjing Yue;Le Zhang;Yihao Liu;Jingyu Yang","doi":"10.1109/TIP.2025.3563775","DOIUrl":"10.1109/TIP.2025.3563775","url":null,"abstract":"Raw low-light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB mappings may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by step-by-step or decoupling the two mappings but usually have large computing complexity. To solve this problem, we propose a single-stage network empowered by Feature Domain Adaptation (FDA) to decouple the denoising and color mapping tasks in raw LLIE. The denoising encoder is supervised by the clean raw image, and then the denoised features are adapted for the color mapping task by an FDA module. We propose a Lineformer to serve as the FDA, which can well explore the global and local correlations with fewer line buffers (friendly to the line-based imaging process). During inference, the raw supervision branch is removed. In this way, our network combines the advantage of a two-stage enhancement process with the efficiency of single-stage inference. Experiments on four benchmark datasets demonstrate that our method achieves state-of-the-art performance with fewer computing costs (60% FLOPs of the two-stage method DNF). Our codes will be released after the acceptance of this work.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2680-2693"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信