IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

Advancing Weakly-Supervised Change Detection in Satellite Images via Adversarial Class Prompting. 基于对抗类提示的卫星图像弱监督变化检测。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-24 DOI: 10.1109/TIP.2025.3623260

Zhenghui Zhao, Chen Wu, Di Wang, Hongruixuan Chen, Cuiqun Chen, Zhuo Zheng, Bo Du, Liangpei Zhang

{"title":"Advancing Weakly-Supervised Change Detection in Satellite Images via Adversarial Class Prompting.","authors":"Zhenghui Zhao, Chen Wu, Di Wang, Hongruixuan Chen, Cuiqun Chen, Zhuo Zheng, Bo Du, Liangpei Zhang","doi":"10.1109/TIP.2025.3623260","DOIUrl":"https://doi.org/10.1109/TIP.2025.3623260","url":null,"abstract":"Weakly-Supervised Change Detection (WSCD) aims to distinguish specific object changes (e.g., objects appearing or disappearing) from background variations (e.g., environmental changes due to light, weather, or seasonal shifts) in paired satellite images, relying only on paired image (i.e., image-level) classification labels. This technique significantly reduces the need for dense annotations required in fully-supervised change detection. However, as image-level supervision only indicates whether objects have changed in a scene, WSCD methods often misclassify background variations as object changes, especially in complex remote-sensing scenarios. In this work, we propose an Adversarial Class Prompting (AdvCP) method to address this co-occurring noise problem, including two phases: a) Adversarial Prompt Mining: After each training iteration, we introduce adversarial prompting perturbations, using incorrect one-hot image-level labels to activate erroneous feature mappings. This process reveals co-occurring adversarial samples under weak supervision, namely background variation features that are likely to be misclassified as object changes. b) Adversarial Sample Rectification: We integrate these adversarially prompt-activated pixel samples into training by constructing an online global prototype. This prototype is built from an exponentially weighted moving average of the current batch and all historical training data. Serving as an unbiased anchor, the global prototype guides the rectification of adversarial pixel samples. Our AdvCP can be seamlessly integrated into current WSCD methods without adding additional inference cost. Experiments on ConvNet, Transformer, and Segment Anything Model (SAM)-based baselines demonstrate significant performance enhancements, achieving up to 7.37%, 7.46%, and 6.56% IoU improvements on the WHU-CD, LEVIR-CD, and DSIFN-CD datasets. Furthermore, we demonstrate the generalizability of AdvCP to other multi-class weakly-supervised dense prediction scenarios. Code is available at https://github.com/zhenghuizhao/AdvCP.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":13.7,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration. 哈尔核规范及其在遥感影像恢复中的应用。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-24 DOI: 10.1109/TIP.2025.3623367

Shuang Xu, Chang Yu, Jiangjun Peng, Shichao Chen, Xiangyong Cao, Deyu Meng

{"title":"Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration.","authors":"Shuang Xu, Chang Yu, Jiangjun Peng, Shichao Chen, Xiangyong Cao, Deyu Meng","doi":"10.1109/TIP.2025.3623367","DOIUrl":"https://doi.org/10.1109/TIP.2025.3623367","url":null,"abstract":"Remote sensing image restoration, which aims to reconstruct corrupted or missing regions, heavily relies on low-rank models. A recent trend in this field is to jointly model low-rank and local smoothness priors using a single regularization term, in order to better recover fine textures. However, due to the entanglement of low- and high-frequency components in an image, existing methods often struggle to simultaneously capture both coarse-grained structures and fine-grained textures, while also suffering from high computational complexity. To address these issues, this paper proposes a novel regularization, the Haar Nuclear Norm (HNN), for efficient and effective remote sensing image restoration. HNN transforms images into wavelet coefficients that separate low-frequency (coarse-grained) and high-frequency (fine-grained) components, and enforces low-rankness via nuclear norms on the mode-3 unfolding matrices of these wavelet coefficients. Experimental evaluations conducted on hyperspectral image inpainting, multi-temporal image cloud removal, and hyperspectral image denoising have revealed the HNN's potential. Typically, HNN achieves a performance improvement of 1-4 dB and a speedup of 10-28x compared to some state-of-the-art methods (e.g., tensor correlated total variation, and fully-connected tensor network) for inpainting tasks. The code is available at https://github.com/isyuchang/HNN.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":13.7,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving. 基于空间几何先验知识的自动驾驶视觉语义分割。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-23 DOI: 10.1109/TIP.2025.3618378

Sicen Guo, Ziwei Long, Zhiyuan Wu, Qijun Chen, Ioannis Pitas, Rui Fan

{"title":"LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving.","authors":"Sicen Guo, Ziwei Long, Zhiyuan Wu, Qijun Chen, Ioannis Pitas, Rui Fan","doi":"10.1109/TIP.2025.3618378","DOIUrl":"https://doi.org/10.1109/TIP.2025.3618378","url":null,"abstract":"Despite the impressive performance achieved by data-fusion networks with duplex encoders for visual semantic segmentation, they become ineffective when spatial geometric data are not available. Implicitly infusing the spatial geometric prior knowledge acquired by a data-fusion teacher network into a single-modal student network is a practical, albeit less explored research avenue. This article delves into this topic and resorts to knowledge distillation approaches to address this problem. We introduce the Learning to Infuse \"X\" (LIX) framework, with novel contributions in both logit distillation and feature distillation aspects. We present a mathematical proof that underscores the limitation of using a single, fixed weight in decoupled knowledge distillation and introduce a logit-wise dynamic weight controller as a solution to this issue. Furthermore, we develop an adaptively-recalibrated feature distillation algorithm, including two novel techniques: feature recalibration via kernel regression and feature consistency quantification via centered kernel alignment. Extensive experiments conducted with intermediate-fusion and late-fusion networks across various public datasets provide both quantitative and qualitative evaluations, demonstrating the superior performance of our LIX framework when compared to other state-of-the-art approaches. Source code is available at https://mias.group/LIX.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":13.7,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

S3OIL: Semi-Supervised SAR-to-Optical Image Translation via Multi-Scale and Cross-Set Matching S3OIL：基于多尺度和交叉集匹配的半监督sar -光学图像转换。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-07 DOI: 10.1109/TIP.2025.3616576

Xi Yang;Haoyuan Shi;Ziyun Li;Maoying Qiao;Fei Gao;Nannan Wang

{"title":"S3OIL: Semi-Supervised SAR-to-Optical Image Translation via Multi-Scale and Cross-Set Matching","authors":"Xi Yang;Haoyuan Shi;Ziyun Li;Maoying Qiao;Fei Gao;Nannan Wang","doi":"10.1109/TIP.2025.3616576","DOIUrl":"10.1109/TIP.2025.3616576","url":null,"abstract":"Image-to-image translation has achieved great success, but still faces the significant challenge of limited paired data, particularly in translating <italic>Synthetic Aperture Radar (SAR) images to optical images. Furthermore, most existing semi-supervised methods place limited emphasis on leveraging the data distribution. To address those challenges, we propose a <italic>Semi-Supervised SAR-to-Optical Image Translation (S3OIL) method that achieves high-quality image generation using minimal paired data and extensive unpaired data while strategically exploiting the data distribution. To this end, we first introduce a <italic>Cross-Set Alignment Matching (CAM) mechanism to create local correspondences between the generated results of paired and unpaired data, ensuring cross-set consistency. In addition, for unpaired data, we apply weak and strong perturbations and establish intra-set <italic>Multi-Scale Matching (MSM) constraints. For paired data, intra-modal semantic consistency (ISC) is presented to ensure alignment with the ground truth. Finally, we propose local and global cross-modal semantic consistency (CSC) to boost structural identity during translation. We conduct extensive experiments on SAR-to-optical datasets and another sketch-to-anime task, demonstrating that S3OIL delivers competitive performance compared to state-of-the-art unsupervised, supervised, and semi-supervised methods, both quantitatively and qualitatively. Ablation studies further reveal that S3OIL can ensure the preservation of both semantic content and structural integrity of the generated images. Our code is available at: <uri>https://github.com/XduShi/SOIL</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6641-6654"},"PeriodicalIF":13.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Single-Shot 3D Reconstruction by Sparse-to-Dense Stereo Matching and Spline Function Based Parallax Modeling 基于稀疏到密集立体匹配和样条函数视差建模的鲁棒单镜头三维重建。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-07 DOI: 10.1109/TIP.2025.3616615

ZhenZhou Wang

{"title":"Robust Single-Shot 3D Reconstruction by Sparse-to-Dense Stereo Matching and Spline Function Based Parallax Modeling","authors":"ZhenZhou Wang","doi":"10.1109/TIP.2025.3616615","DOIUrl":"10.1109/TIP.2025.3616615","url":null,"abstract":"Single-shot 3D surface imaging techniques with high accuracy and high resolution are very important in both academia and industry. In this paper, we propose a sparse-to-dense structured light (SL) line-pattern based active stereo vision (ASV) approach to reconstruct the 3D shapes robustly with high-resolution. We propose a sparse-to-dense stereo matching (SDSM) method to solve the challenging problem of line clustering and line matching. We design the structured light line pattern with four colors and the distances between lines of different color range from sparse to dense. Accordingly, the sparse color lines could be clustered and matched at first while the dense color lines could be matched subsequently with the constraint of the clustered and matched sparse color lines. After all the color lines are matched, a spline-function based parallax model (SFPM) is computed based on the points on the matched color lines. Then, the depths of the points in the regions between the color lines are computed by the parallax model. Experimental results show that the proposed SDSM-SFPM ASV approach is more robust than existing methods especially in reconstructing the complex 3D shapes.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6616-6628"},"PeriodicalIF":13.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Shared-Memory Parallel Alpha-Tree Algorithm for Extreme Dynamic Ranges 一种极端动态范围的共享内存并行阿尔法树算法。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-07 DOI: 10.1109/TIP.2025.3616578

Jiwoo Ryu;Scott C. Trager;Michael H. F. Wilkinson

{"title":"A Shared-Memory Parallel Alpha-Tree Algorithm for Extreme Dynamic Ranges","authors":"Jiwoo Ryu;Scott C. Trager;Michael H. F. Wilkinson","doi":"10.1109/TIP.2025.3616578","DOIUrl":"10.1109/TIP.2025.3616578","url":null,"abstract":"The <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree is an effective hierarchical image representation used for connected filtering or segmentation in remote sensing and other image applications. The <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree constructs a tree based on the dissimilarities of the pixels in an image. Compared to other hierarchical image representations such as the component tree, the <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree provides a better representation of the granularity of images and is easier to apply to multichannel images. The major drawback of the <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree is its processing speed, due to the large amount of data to be processed and the lack of studies on an efficient algorithms, especially on multichannel and high dynamic range images. In this study, we introduce a novel adaptation of the hybrid component tree algorithm on the <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree for fast parallel <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree construction in any dynamic range of pixel dissimilarity. We tested the hybrid <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree algorithm on Sentinel-2 remote sensing images from the European Space Agency (ESA) as well as randomly generated images, on the Hábrók high performance computing cluster. Experimental results show that the hybrid <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree algorithm achieves the processing speed of 10–30Mpix/s and the speedup of 10–30 on a 128-core computer, proving the efficiency of the first parallel <inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>-tree algorithm in high dynamic range, to the best of our knowledge.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6629-6640"},"PeriodicalIF":13.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PFIG-Palm: Controllable Palmprint Generation via Pixel and Feature Identity Guidance PFIG-Palm：基于像素和特征识别引导的可控掌纹生成。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-07 DOI: 10.1109/TIP.2025.3616611

Yuchen Zou;Huikai Shao;Chengcheng Liu;Siyu Zhu;Zongqing Hou;Dexing Zhong

{"title":"PFIG-Palm: Controllable Palmprint Generation via Pixel and Feature Identity Guidance","authors":"Yuchen Zou;Huikai Shao;Chengcheng Liu;Siyu Zhu;Zongqing Hou;Dexing Zhong","doi":"10.1109/TIP.2025.3616611","DOIUrl":"10.1109/TIP.2025.3616611","url":null,"abstract":"Palmprint recognition offers a promising solution for convenient and private authentication. However, the scarcity of large-scale palmprint datasets constrains its development and application. Recent approaches have sought to mitigate this issue by synthesizing palmprints based on Bézier curves. Due to the lack of paired data between curves and palmprints, it is difficult to generate curve-driven palmprints with precise identity. To address this challenge, we propose a novel Pixel and Feature Identity Guidance (PFIG) framework to synthesize realistic palmprints, whose IDs are strictly governed by the Bézier curves. In order to establish ID mapping, an ID Injection (IDI) module is constructed to synthesize pseudo-paired data. Two cross-domain ID consistency losses at pixel and feature levels are further proposed to strictly preserve the semantic information of the input ID curves. Experimental results demonstrate that our ID-guided approach can synthesize more realistic palmprints with controllable identities. Based on only 80,000 synthesized palmprints for pre-training, the recognition accuracy can be improved by more than 18% in terms of TAR@1e-6. When trained exclusively on synthetic data, our method achieves superior performance to existing synthetic approaches. The source code is available at <uri>https://github.com/YuchenZou/PFIG-Palm</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6603-6615"},"PeriodicalIF":13.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural Scene Designer: Self-Styled Semantic Image Manipulation 神经场景设计师：自我风格的语义图像处理

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-06 DOI: 10.1109/TIP.2025.3614000

Jianman Lin;Tianshui Chen;Chunmei Qing;Zhijing Yang;Shuangping Huang;Yuheng Ren;Liang Lin

{"title":"Neural Scene Designer: Self-Styled Semantic Image Manipulation","authors":"Jianman Lin;Tianshui Chen;Chunmei Qing;Zhijing Yang;Shuangping Huang;Yuheng Ren;Liang Lin","doi":"10.1109/TIP.2025.3614000","DOIUrl":"10.1109/TIP.2025.3614000","url":null,"abstract":"Maintaining stylistic consistency is crucial for the cohesion and aesthetic appeal of images, a fundamental requirement in effective image editing and inpainting. However, existing methods primarily focus on the semantic control of generated content, often neglecting the critical task of preserving this consistency. In this work, we introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions while ensuring both semantic alignment with user intent and stylistic consistency with the surrounding environment. NSD leverages an advanced diffusion model, incorporating two parallel cross-attention mechanisms that separately process text and style information to achieve the dual objectives of semantic control and style consistency. To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module. This module is predicated on the intuitive premise that different regions within a single image share a consistent style, whereas regions from different images exhibit distinct styles. The PSRL module employs a style contrastive loss that encourages high similarity between representations from the same image while enforcing dissimilarity between those from different images. Furthermore, to address the lack of standardized evaluation protocols for this task, we establish a comprehensive benchmark. This benchmark includes competing algorithms, dedicated style-related metrics, and diverse datasets and settings to facilitate fair comparisons. Extensive experiments conducted on our benchmark demonstrate the effectiveness of the proposed framework.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6577-6588"},"PeriodicalIF":13.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145235675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MVFusion: Generative Representation Learning With Masked Variational Autoencoders for Multi-Modality Image Fusion 多模态图像融合：基于掩模变分自编码器的生成表示学习

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-06 DOI: 10.1109/TIP.2025.3615680

Jingwei Xin;Boneng Shi;Nannan Wang;Jie Li;Xinbo Gao

{"title":"MVFusion: Generative Representation Learning With Masked Variational Autoencoders for Multi-Modality Image Fusion","authors":"Jingwei Xin;Boneng Shi;Nannan Wang;Jie Li;Xinbo Gao","doi":"10.1109/TIP.2025.3615680","DOIUrl":"10.1109/TIP.2025.3615680","url":null,"abstract":"Creating a comprehensively representative image while maintaining the merits of various modalities is a key focus of current Multi-Modality Image Fusion research. Existing unified methods often struggle to handle varying types of degradation while extracting modality-shared and modality-specific information from source images, leading to limitations in their generative or representation capabilities under different conditions. To address the challenge, we propose MVFusion, a novel self-supervised masked variational autoencoder framework that simultaneously enhances generative training and representation learning. It is designed to cope with varying image quality and dataset composition with a unified framework while ensuring effective fusion of modality information. Specifically, MVFusion employs a self-supervised masked autoencoder to reduce the impact of redundancy and degradation in the source images, and thus learns the latent distribution of degraded input images in the generative training stage. In addition, we incorporate variational feature learning to further preserve the distinctive modality features in the representation learning stage. Extensive experiments demonstrate that our model achieves promising results in several classical fusion tasks, including infrared-visible, multi-focus, multi-exposure, and medical image fusion. The code is available at <uri>https://github.com/shiboneng/MVFusion</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6418-6431"},"PeriodicalIF":13.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145235673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Super-Resolving Dynamic Scenes With Spike Camera via Multi-Frame Sequential Alignment With Motion Propagation 基于运动传播的多帧序列对齐的超分辨动态场景Spike相机

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-10-06 DOI: 10.1109/TIP.2025.3613943

Yuanlin Wang;Yiyang Zhang;Ruiqin Xiong;Jian Zhang;Xinfeng Zhang;Tiejun Huang

{"title":"Super-Resolving Dynamic Scenes With Spike Camera via Multi-Frame Sequential Alignment With Motion Propagation","authors":"Yuanlin Wang;Yiyang Zhang;Ruiqin Xiong;Jian Zhang;Xinfeng Zhang;Tiejun Huang","doi":"10.1109/TIP.2025.3613943","DOIUrl":"10.1109/TIP.2025.3613943","url":null,"abstract":"Spike camera is a neuromorphic sensor that can capture high-speed dynamic scenes by firing a continuous stream of binary spikes with extremely high temporal resolution, essentially forming a dense sampling in the temporal dimension. Due to the relative motion between camera and scene, each pixel is actually sampling at a large number of different spatial positions on the object in a short period. Converting this dense sampling from temporal dimension to spatial domain, high resolution images can be reconstructed from the spike stream. However, spike fluctuations and large motion in high-speed scenes pose great challenges for this task, especially for intensity information extraction and temporal alignment. In this paper, we propose a spike camera super resolution network to address these issues. Considering the local temporal correlation of spike stream and correlation consistency within a local region, we introduce a representation module that performs region-adaptive temporal filtering on spikes to mitigate fluctuations and extract stable intensity information from binary data. Additionally, we develop a module for multi-frame feature alignment, leveraging the long-term temporal information of spike stream. To handle large motions, we propagate the motion information from neighboring moment to current feature alignment module, which provides a prior that helps to narrow the search range for current motion offset, improving the accuracy of temporal alignment. Experimental results demonstrate that the proposed network achieves state-of-the-art performance on synthetic and real-captured spike data.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6537-6549"},"PeriodicalIF":13.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145235672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0