IEEE Transactions on Image Processing最新文献_第6页

UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework. UMCFuse：一个统一的多复杂场景红外和可见光图像融合框架。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607623

Xilai Li,Xiaosong Li,Tianshu Tan,Huafeng Li,Tao Ye

{"title":"UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework.","authors":"Xilai Li,Xiaosong Li,Tianshu Tan,Huafeng Li,Tao Ye","doi":"10.1109/tip.2025.3607623","DOIUrl":"https://doi.org/10.1109/tip.2025.3607623","url":null,"abstract":"Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to complex scenes fusion, leading to sub-optimal results under interference. To fill this gap, we propose a unified framework for infrared and visible images fusion in complex scenes, termed UMCFuse. Specifically, we classify the pixels of visible images from the degree of scattering of light transmission, allowing us to separate fine details from overall intensity. Maintaining a balance between interference removal and detail preservation is essential for the generalization capacity of the proposed method. Therefore, we propose an adaptive denoising strategy for the fusion of detail layers. Meanwhile, we fuse the energy features from different modalities by analyzing them from multiple directions. Extensive fusion experiments on real and synthetic complex scenes datasets cover adverse weather conditions, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection, and depth estimation, consistently indicate the superiority of the proposed method compared with the recent representative methods. Our code is available at https://github.com/ixilai/UMCFuse.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"64 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HOPE: Enhanced Position Image Priors via High-Order Implicit Representations. 希望：通过高阶隐式表示增强位置图像先验。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607582

Yang Chen,Ruituo Wu,Junhui Hou,Ce Zhu,Yipeng Liu

{"title":"HOPE: Enhanced Position Image Priors via High-Order Implicit Representations.","authors":"Yang Chen,Ruituo Wu,Junhui Hou,Ce Zhu,Yipeng Liu","doi":"10.1109/tip.2025.3607582","DOIUrl":"https://doi.org/10.1109/tip.2025.3607582","url":null,"abstract":"Deep Image Prior (DIP) has shown that networks with stochastic initialization and custom architectures can effectively address inverse imaging challenges. Despite its potential, DIP requires significant computational resources, whereas the lighter Implicit Neural Positional Image Prior (PIP) often yields overly smooth solutions due to exacerbated spectral bias. Research on lightweight, high-performance solutions for inverse imaging remains limited. This paper proposes a novel framework, Enhanced Positional Image Priors through High-Order Implicit Representations (HOPE), incorporating high-order interactions between layers within a conventional cascade structure. This approach reduces the spectral bias commonly seen in PIP, enhancing the model's ability to capture both low- and high-frequency components for optimal inverse problem performance. We theoretically demonstrate that HOPE's expanded representational space, narrower convergence range, and improved Neural Tangent Kernel (NTK) diagonal properties enable more precise frequency representations than PIP. Comprehensive experiments across tasks such as signal representation (audio, image, volume) and inverse image processing (denoising, super-resolution, CT reconstruction, inpainting) confirm that HOPE establishes new benchmarks for recovery quality and training efficiency.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"24 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Video Recoloring via Curve-based Palettes. 通过基于曲线的调色板快速视频重新着色。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607584

Zheng-Jun Du,Jia-Wei Zhou,Kang Li,Jian-Yu Hao,Zi-Kang Huang,Kun Xu

{"title":"Fast Video Recoloring via Curve-based Palettes.","authors":"Zheng-Jun Du,Jia-Wei Zhou,Kang Li,Jian-Yu Hao,Zi-Kang Huang,Kun Xu","doi":"10.1109/tip.2025.3607584","DOIUrl":"https://doi.org/10.1109/tip.2025.3607584","url":null,"abstract":"Color grading, as a crucial step in film post-production, plays an important role in emotional expression and artistic enhancement. Recently, a geometric palette-based approach to video recoloring has been introduced with impressive results. It offers an intuitive interface that allows users to alter the color of a video by manipulating a limited set of representative colors. However, this method has two primary limitations. Firstly, palette extraction is computationally expensive, often taking more than one hour to generate palettes even for medium-length videos, which significantly limits the practical application of color editing for longer videos. Secondly, the palette colors are less representative, and some primary colors may be omitted from the resulting palettes during topological simplification, making it less intuitive in color editing. To overcome these limitations, in this paper, we propose a novel approach to video recoloring. The core of our method is a set of Bézier curves that connect the dominant colors throughout the input video. By slicing these Bézier curves in RGBT space, per-frame palette can be naturally derived. During recoloring, users can select several frames of interest and modify their corresponding palettes to change the color of the video. Our method is simple and intuitive, enabling compelling time-varying recoloring results. Compared to existing methods, our approach is more efficient in palette extraction and can effectively capture the dominant colors of the video. Extensive experiments demonstrate the effectiveness of our method.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"28 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatio-Temporal Evolutionary Graph Learning for Brain Network Analysis using Medical Imaging. 基于医学影像的脑网络分析的时空演化图学习。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607633

Shengrong Li,Qi Zhu,Chunwei Tian,Li Zhang,Bo Shen,Chuhang Zheng,Daoqiang Zhang,Wei Shao

{"title":"Spatio-Temporal Evolutionary Graph Learning for Brain Network Analysis using Medical Imaging.","authors":"Shengrong Li,Qi Zhu,Chunwei Tian,Li Zhang,Bo Shen,Chuhang Zheng,Daoqiang Zhang,Wei Shao","doi":"10.1109/tip.2025.3607633","DOIUrl":"https://doi.org/10.1109/tip.2025.3607633","url":null,"abstract":"Dynamic functional brain network (DFBN) can flexibly describe the time-varying topological connectivity patterns of the brain, and show great potential in brain disease diagnosis. However, most of the existing DFBN analysis methods focus on capturing the dynamic interaction at the brain region level, ignoring the spatio-temporal topological evolution across time windows. Moreover, they are difficult to suppress interfering connections in DFBNs, which leads to a diminished capacity for discerning the intrinsic structures that are intimately linked to brain disorders. To address these issues, we propose a topological evolution graph learning model to capture disease-related spatio-temporal topological features in DFBNs. Specifically, we first take the hubness of adjacent DFBN as the source domain and the target domain in turn, and then use Wasserstein distance (WD) and Gromov-Wasserstein distance (GWD) to capture the brain's evolution law at the node and edge levels, respectively. Furthermore, we introduce the principle of relevant information to guide the topology evolution graph to learn the structures that are most relevant to brain diseases yet least redundant information between adjacent DFBNs. On this basis, we develop a high-order spatio-temporal model with multi-hop graph convolution to collaboratively extract long-range spatial and temporal dependencies from the topological evolution graph. Extensive experiments show that the proposed method outperforms the current state-of-the-art methods, and can effectively reveal the information evolution mechanism between brain regions across windows.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-supervised Text-based Person Search 半监督的基于文本的人物搜索

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607637

Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang

引用次数: 0

Exploring Multimodal Knowledge for Image Compression via Large Foundation Models. 通过大型基础模型探索图像压缩的多模态知识。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607616

Junlong Gao,Zhimeng Huang,Qi Mao,Siwei Ma,Chuanmin Jia

{"title":"Exploring Multimodal Knowledge for Image Compression via Large Foundation Models.","authors":"Junlong Gao,Zhimeng Huang,Qi Mao,Siwei Ma,Chuanmin Jia","doi":"10.1109/tip.2025.3607616","DOIUrl":"https://doi.org/10.1109/tip.2025.3607616","url":null,"abstract":"Knowledge is an abstraction of factual principles of the physical world. Large foundation models encapsulate extensive multimodal knowledge into the parameters and thus invoke machine intelligence on various tasks. How to invoke the knowledge in these models to facilitate image compression lacks in-depth exploration. In this work, we aim to harness multi-modal knowledge into ultra-low bitrate compression and propose Multimodal Knowledge-aware Image Compression (MKIC). Our key insight is that under the context of ultra-low bitrate compression, where the encoded representation is too sparse to represent enough information of the input signal, knowledge from the physical world is required to be incorporated into the compression. Thus, more shared patterns can be stored in the model together with sparse unique features also embedded into the bitstream. In light of two kinds of knowledge, namely natural visual knowledge and human language knowledge, we propose a novel Alternating Rate-Distortion Optimization to enhance the accuracy and compactness of global semantic text representation extraction, extract the local feature map that captures visual details, and integrate these multimodal representations into a large generative foundation model to achieve high-quality reconstruction. The proposed method relights the path of learned image coding, leveraging decoupled knowledge from large foundation models. Extensive experiments show that our proposed method achieves superior comprehensive performance compared to various methods and shows great potential for ultra-low bitrate image compression.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"83 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synergistic Prompting Learning for Human-Object Interaction Detection. 人-物交互检测的协同提示学习。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607614

Jinguo Luo,Weihong Ren,Zhiyong Wang,Xi'ai Chen,Huijie Fan,Zhi Han,Honghai Liu

{"title":"Synergistic Prompting Learning for Human-Object Interaction Detection.","authors":"Jinguo Luo,Weihong Ren,Zhiyong Wang,Xi'ai Chen,Huijie Fan,Zhi Han,Honghai Liu","doi":"10.1109/tip.2025.3607614","DOIUrl":"https://doi.org/10.1109/tip.2025.3607614","url":null,"abstract":"Human-Object Interaction (HOI) detection, as a foundational task in human-centric understanding, aims to detect interactive triplets in real-world scenarios. To better distinguish diverse HOIs within an open-world context, current HOI detectors utilize pre-trained Visual-Language Models (VLMs) to extract prior knowledge through textual prompts (i.e., descriptive texts for each HOI instance). However, relying on predetermined descriptive texts, such approaches only acquire a fixed set of textual knowledge for HOI prediction, consequently resulting in inferior performance and limited generalization. To remedy this, we propose a novel VLM-based method, which jointly performs prompting learning from both visual and textual perspectives and synergizes visual-textual prompting for HOI detection. Initially, we design a hierarchical adaptation architecture to perform progressive prompting: visual prompting is facilitated through gradual token migration from VLM's image encoder, while textual prompting is initialized with progressively leveled interaction descriptions. In addition, to synergize the visual-textual prompting learning, a text-supervising and image-tuning loop is introduced, in which the text-supervising stage guides visual prompting learning through contrastive learning and the image-tuning stage refines textual prompting by modal matching. Finally, we employ an interaction-aware knowledge merging mechanism to effectively transfer visual-textual knowledge encapsulated within synergistic prompting for HOI detection. Extensive experiments on two benchmarks demonstrate that our proposed method outperforms the state-of-the-art ones, under both supervised and zero-shot settings.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"35 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Multi-Category Anomaly Editing Network with Correlation Exploration and Voxel-level Attention for Unsupervised Surface Anomaly Detection. 基于相关探测和体素级关注的多类别异常编辑网络用于无监督地表异常检测。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607638

Ruifan Zhang,Hai-Miao Hu

{"title":"A Multi-Category Anomaly Editing Network with Correlation Exploration and Voxel-level Attention for Unsupervised Surface Anomaly Detection.","authors":"Ruifan Zhang,Hai-Miao Hu","doi":"10.1109/tip.2025.3607638","DOIUrl":"https://doi.org/10.1109/tip.2025.3607638","url":null,"abstract":"Developing a unified model for surface anomaly detection remains challenging due to significant variations across product categories. Recent feature editing methods, as a branch of image reconstruction, mitigate the over-generalization of auto-encoders that leads to accurate anomaly reconstruction. However, these methods are only suited for texture-category products and have significant limitations in being generalized to other categories. In this article, we propose a multi-category anomaly editing network with a dual-branch training approach: one branch processes defect-free images (normal branch), while the other handles synthetic anomaly images (anomaly branch). Specifically, the paired samples are first fed into the multi-category anomaly feature editing based auto-encoder (MCAFE-AE) to perform image reconstruction and inpainting. In the normal branch, we propose a dual-entropy constrained deep embedded clustering module (DEC-DECM) to promote a more compact and orderly distribution of normal latent features, while avoiding trivial clustering solutions. Based on the clustering results, we further design a patch-based adaptive thresholding (PAT) strategy to adaptively calculate the threshold representing the central boundary of the cluster center for each local patch, thereby enabling the model to detect anomalies. Then, in the anomaly branch, we propose a multi-category anomaly feature editing module (MCAFEM) to identify anomalies in synthetic images and apply a category-oriented feature editing strategy to transform detected anomaly features into normal ones, thereby suppressing the reconstruction of anomalies. After completing the image reconstruction and inpainting, the input images from both branches and their respective output images are concatenated and fed into the correlation exploration and voxel-level attention based prediction network (CEVA-Net) for anomaly segmentation. The network is integrated with our proposed correlation-dependency exploration and voxel-level attention refinement module (CDE-VARM) and generates precise anomaly maps under the guidance of the bidirectional-path feature fusion (BPFF) and deep supervised learning (DSL). Extensive experiments on three datasets show that our method achieves state-of-the-art performance.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"83 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual-Space Topological Isomorphism and Maximization of Predictive Diversity for Unsupervised Domain Adaptation. 双空间拓扑同构与无监督域自适应预测多样性最大化。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3608670

Mengru Wang,Jinglei Liu

{"title":"Dual-Space Topological Isomorphism and Maximization of Predictive Diversity for Unsupervised Domain Adaptation.","authors":"Mengru Wang,Jinglei Liu","doi":"10.1109/tip.2025.3608670","DOIUrl":"https://doi.org/10.1109/tip.2025.3608670","url":null,"abstract":"Most existing unsupervised domain adaptation methods rely on explicitly or implicitly aligning the features of source and target domains to construct a domain-invariant space, often using entropy minimization to reduce uncertainty and confusion. However, this approach faces two challenges: 1) Explicit alignment reduces discriminability, while implicit alignment risks pseudo-label noise, making it hard to balance structure preservation and alignment. 2) Sole reliance on entropy minimization can lead to trivial solutions in UDA, where all samples collapse into a single class. To address these issues, we propose Dual-Space Topological Isomorphism and Maximization of Predictive Diversity (DTI-MPD). Topological isomorphism is a continuous, bijective mapping that preserves the topological properties of two spaces, ensuring the global structure and relationships of data remain intact during alignment. Our method aligns source and target domain data in two independent spaces while balancing the effects of entropy minimization through predictive diversity maximization. The core of dual-space topological isomorphism lies in establishing a reversible correspondence between the source and target domains, avoiding information loss during alignment and preserving the global structural and topological characteristics of the data. Meanwhile, predictive diversity maximization mitigates the class collapse caused by entropy minimization, ensuring a more balanced predictive distribution across categories. This approach effectively overcomes the aforementioned issues, enabling better adaptation to new data. Extensive experiments demonstrate that our method achieves state-of-the-art performance on multiple benchmark datasets, validating its effectiveness.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"28 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Streaming View Classification with Noisy Label. 带噪声标签的流视图分类。

IF 10.6 1区计算机科学

IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607610

Xiao Ouyang,Ruidong Fan,Hong Tao,Chenping Hou

{"title":"Streaming View Classification with Noisy Label.","authors":"Xiao Ouyang,Ruidong Fan,Hong Tao,Chenping Hou","doi":"10.1109/tip.2025.3607610","DOIUrl":"https://doi.org/10.1109/tip.2025.3607610","url":null,"abstract":"In many image processing tasks, e.g., 3D reconstruction of dynamic scenes, different types of descriptions, a.k.a., views, of an object are emerging in a streaming way. Streaming view learning provides an effective solution to this dynamic view problem. In this paradigm, existing streaming view learning methods typically assume that all labels are accurate. However, in many real-world applications, the initial views may be not good enough for characterizing, leading to noisy labels that degrade classification performance. How to learn a model for simultaneous view evolving and label ambiguity is critical yet unexplored. In this paper, we propose a novel method called Streaming View Classification with Noisy Label (SVCNL). We calibrate noisy labels according to the emerging of new views, thereby reflecting the dynamic changes in the data more accurately. Leveraging the sequential and non-revisitable nature of views, the method tunes existing models to inherit information from previous stages by utilizing current-stage data. It reconstructs noisy labels through a label transition matrix and establishes relationships between true labels and samples using a graph embedding strategy, progressively correcting noisy labels. Together with the theoretical analyses about generalization bounds, extensive experiments demonstrate the effectiveness of the proposed approach.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"71 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0