IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献_第8页

Visual Quality Assessment of Composite Images: A Compression-Oriented Database and Measurement 合成图像的视觉质量评估：一个面向压缩的数据库和测量。

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-18 DOI: 10.1109/TIP.2025.3550005

Miaohui Wang;Zhuowei Xu;Xiaofang Zhang;Yuming Fang;Weisi Lin

{"title":"Visual Quality Assessment of Composite Images: A Compression-Oriented Database and Measurement","authors":"Miaohui Wang;Zhuowei Xu;Xiaofang Zhang;Yuming Fang;Weisi Lin","doi":"10.1109/TIP.2025.3550005","DOIUrl":"10.1109/TIP.2025.3550005","url":null,"abstract":"Composite images (CIs) have experienced unprecedented growth, especially with the prosperity of a large number of generative AI technologies. They are usually created by combining multiple visual elements from different sources to form a single cohesive composition, which have an increasing impact on a variety of vision applications. However, transmission of CIs can degrade their visual quality, especially undergoing lossy compression to reduce bandwidth and storage. To facilitate the development of objective measurements for CIs and investigate the influence of compression distortions on their perception, we establish a compression-oriented image quality assessment (CIQA) database for CIs (called ciCIQA) with 30 typical encoding distortions. Compressed with six representative codecs, we have carried out a large-scale subjective experiment that delivered 3,000 encoded CIs with labeled quality scores, making ciCIQA one of the earliest CI databases with the most compression types. ciCIQA enables us to explore the encoding effects on visual quality from the first five just noticeable difference (JND) points, offering insights for perceptual CI compression and related tasks. Moreover, we have proposed a new multi-masked no-reference CIQA method(called mmCIQA), including a multi-masked quality representation module, a self-supervised quality alignment module, and a multi-masked attentive fusion module. Experimental results demonstrate the outstanding performance of our mmCIQA in assessing the quality of CIs, outperforming 17 competitive approaches. The proposed method and database as well as the collected objective metrics are made publicly available on <uri>https://charwill.github.io/mmciqa.html</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1849-1863"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HSLabeling: Toward Efficient Labeling for Large-Scale Remote Sensing Image Segmentation With Hybrid Sparse Labeling HSLabeling：基于混合稀疏标记的大规模遥感图像分割的高效标记。

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-18 DOI: 10.1109/TIP.2025.3550039

Jiaxing Lin;Zhen Yang;Qiang Liu;Yinglong Yan;Pedram Ghamisi;Weiying Xie;Leyuan Fang

{"title":"HSLabeling: Toward Efficient Labeling for Large-Scale Remote Sensing Image Segmentation With Hybrid Sparse Labeling","authors":"Jiaxing Lin;Zhen Yang;Qiang Liu;Yinglong Yan;Pedram Ghamisi;Weiying Xie;Leyuan Fang","doi":"10.1109/TIP.2025.3550039","DOIUrl":"10.1109/TIP.2025.3550039","url":null,"abstract":"Dense pixel-wise labeling of large-scale remote sensing images (RSI) is very time-consuming, while sparse labels (i.e., points, scribbles, or blocks) can be an efficient way to reduce labeling costs. Most existing sparse label-based methods adopt only one type of label for image segmentation, which cannot reflect the complex land covers in the RSI for training the model, thus leading to inferior segmentation performance. We observe that land covers with different shapes and complexity can be optimally represented by different sparse labels. Inspired by this observation, we propose a novel sparse labeling framework, termed Hybrid Sparse Labeling (HSLabeling), for large-scale RSI segmentation. Our HSLabeling can adaptively select the optimal hybrid sparse labels for different land covers, according to labeling cost and segmentation contribution of different sparse labels. Specifically, we first propose a label segmentation contribution information estimation module that estimates the information of different sparse labels according to the diversity and shape of land covers. After that, we propose an Optimal Hybrid Labeling Strategy (OHLS) to assign optimal types of labels for different land covers. In the OHLS, label assignment is formulated as an optimization problem that trades off label segmentation contribution information and labeling cost. We employ the greedy algorithm to efficiently solve the optimization problem and adaptively assign labels for varied land covers. Extensive experiments on three large-scale RSI datasets have demonstrated that our HSLabeling achieves almost fully supervised performance with extremely low labeling costs. In addition, compared with the single type sparse label, HSLabeling can also utilize much lower labeling costs to obtain the same performance. The source code is available at <uri>https://github.com/linjiaxing99/HSLabeling</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1864-1878"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Zerotree Coding of Subdivision Wavelet Coefficients in Dynamic Time-Varying Meshes 动态时变网格细分小波系数的零树编码

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI: 10.1109/TIP.2025.3549998

Maja Krivokuća;Tomás M. Borges;Ricardo L. de Queiroz

{"title":"Zerotree Coding of Subdivision Wavelet Coefficients in Dynamic Time-Varying Meshes","authors":"Maja Krivokuća;Tomás M. Borges;Ricardo L. de Queiroz","doi":"10.1109/TIP.2025.3549998","DOIUrl":"10.1109/TIP.2025.3549998","url":null,"abstract":"We propose a complete system to enable progressive coding with quality scalability of the mesh geometry, in MPEG’s state-of-the-art Video-based Dynamic Mesh Coding (V-DMC) framework. In particular, we propose an alternative method for encoding the subdivision wavelet coefficients in V-DMC, using a zerotree coding approach that works directly in the native 3D mesh space. This allows us to identify parent-child relationships amongst the wavelet coefficients across different subdivision levels, which can be used to achieve an efficient and versatile coding mechanism. We demonstrate that, given a starting base mesh, a target subdivision surface and a desired maximum number of zerotree passes, our system produces an elegant and visually attractive lossy-to-lossless mesh geometry reconstruction with no further user intervention. Moreover, lossless coefficient encoding with our approach requires nearly the same bitrate as the default displacement coding methods in V-DMC. Yet, our approach provides several quality resolution levels embedded in the same bitstream, while the current V-DMC solutions encode a single quality level only. To the best of our knowledge, this is the first time that a zerotree-based method has been proposed and demonstrated to work for the compression of dynamic time-varying meshes, and the first time that an embedded quality-scalable approach has been used in the V-DMC framework.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1810-1819"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Equivariant Local Reference Frames With Optimization for Robust Non-Rigid Point Cloud Correspondence 鲁棒非刚体点云对应的等变局部参考框架优化

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI: 10.1109/TIP.2025.3550006

Ling Wang;Runfa Chen;Fuchun Sun;Xinzhou Wang;Kai Sun;Chengliang Zhong;Guangyuan Fu;Yikai Wang

{"title":"Equivariant Local Reference Frames With Optimization for Robust Non-Rigid Point Cloud Correspondence","authors":"Ling Wang;Runfa Chen;Fuchun Sun;Xinzhou Wang;Kai Sun;Chengliang Zhong;Guangyuan Fu;Yikai Wang","doi":"10.1109/TIP.2025.3550006","DOIUrl":"10.1109/TIP.2025.3550006","url":null,"abstract":"Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on the assumption of local rigidity, one solution for reducing complexity is to decompose the overall shape into independent local regions using Local Reference Frames (LRFs) that are equivariant to SE(3) transformations. However, the focus solely on local structure neglects global geometric contexts, resulting in less distinctive LRFs that lack crucial semantic information necessary for effective matching. Furthermore, such complexity introduces out-of-distribution geometric contexts during inference, thus complicating generalization. To this end, we introduce 1) <sc>EquiShape</small>, a novel structure tailored to learn pair-wise LRFs with global structural cues for both spatial and semantic consistency, and 2) LRF-Refine, an optimization strategy generally applicable to LRF-based methods, aimed at addressing the generalization challenges. Specifically, for <sc>EquiShape</small>, we employ cross-talk within separate equivariant graph neural networks (Cross-GVP) to build long-range dependencies to compensate for the lack of semantic information in local structure modeling, deducing pair-wise independent SE(3)-equivariant LRF vectors for each point. For LRF-Refine, the optimization adjusts LRFs within specific contexts and knowledge, enhancing the geometric and semantic generalizability of point features. Our overall framework surpasses the state-of-the-art methods by a large margin on three benchmarks. Codes are available at <uri>https://github.com/2019EPWL/EquiShape</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1980-1994"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Event-Based Video Reconstruction With Deep Spatial-Frequency Unfolding Network 基于事件的深度空频展开网络视频重构

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI: 10.1109/TIP.2025.3550008

Chengjie Ge;Xueyang Fu;Kunyu Wang;Zheng-Jun Zha

{"title":"Event-Based Video Reconstruction With Deep Spatial-Frequency Unfolding Network","authors":"Chengjie Ge;Xueyang Fu;Kunyu Wang;Zheng-Jun Zha","doi":"10.1109/TIP.2025.3550008","DOIUrl":"10.1109/TIP.2025.3550008","url":null,"abstract":"Current event-based video reconstruction methods, limited to the spatial domain, face challenges in decoupling brightness and structural information, leading to exposure distortion, and in efficiently acquiring non-local information without relying on computationally expensive Transformer models. To address these issues, we propose the Deep Spatial-Frequency Unfolding Reconstruction Network (DSFURNet), which explores and utilizes knowledge in the frequency domain for event-based video reconstruction. Specifically, we construct a variational model and propose three regularization terms: a brightness regularization term approximated by Fourier amplitudes, a structural regularization term approximated by Fourier phases, and an initialization regularization term that converts event representations into initial video frames. Then, we design corresponding spatial-frequency domain approximation operators for each regularization term. Benefiting from the global nature of computations in the frequency domain, the designed approximation operators can integrate local spatial and global frequency information at a lower computational cost. Furthermore, we combine the learned knowledge of the three regularization terms and unfold the optimization algorithm into an iterative deep network. Through this approach, the pixel-level initialization regularization constraint and the frequency domain brightness and structural regularization constraints can continuously play a role during the testing process, achieving a gradual improvement in the quality of the reconstructed video frames. Compared to existing methods, our network significantly reduces the number of network parameters while improving evaluation metrics.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1779-1794"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation FRNet：可扩展激光雷达分割的截距网络

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI: 10.1109/TIP.2025.3550011

Xiang Xu;Lingdong Kong;Hui Shuai;Qingshan Liu

{"title":"FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation","authors":"Xiang Xu;Lingdong Kong;Hui Shuai;Qingshan Liu","doi":"10.1109/TIP.2025.3550011","DOIUrl":"10.1109/TIP.2025.3550011","url":null,"abstract":"LiDAR segmentation has become a crucial component of advanced autonomous driving systems. Recent range-view LiDAR segmentation approaches show promise for real-time processing. However, they inevitably suffer from corrupted contextual information and rely heavily on post-processing techniques for prediction refinement. In this work, we propose FRNet, a simple yet powerful method aimed at restoring the contextual information of range image pixels using corresponding frustum LiDAR points. First, a frustum feature encoder module is used to extract per-point features within the frustum region, which preserves scene consistency and is critical for point-level predictions. Next, a frustum-point fusion module is introduced to update per-point features hierarchically, enabling each point to extract more surrounding information through the frustum features. Finally, a head fusion module is used to fuse features at different levels for final semantic predictions. Extensive experiments conducted on four popular LiDAR segmentation benchmarks under various task setups demonstrate the superiority of FRNet. Notably, FRNet achieves 73.3% and 82.5% mIoU scores on the testing sets of SemanticKITTI and nuScenes. While achieving competitive performance, FRNet operates 5 times faster than state-of-the-art approaches. Such high efficiency opens up new possibilities for more scalable LiDAR segmentation. The code has been made publicly available at <uri>https://github.com/Xiangxu-0103/FRNet</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2173-2186"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Raformer: Redundancy-Aware Transformer for Video Wire Inpainting Raformer：用于视频线涂漆的冗余感知变压器

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI: 10.1109/TIP.2025.3550038

Zhong Ji;Yimu Su;Yan Zhang;Jiacheng Hou;Yanwei Pang;Jungong Han

{"title":"Raformer: Redundancy-Aware Transformer for Video Wire Inpainting","authors":"Zhong Ji;Yimu Su;Yan Zhang;Jiacheng Hou;Yanwei Pang;Jungong Han","doi":"10.1109/TIP.2025.3550038","DOIUrl":"10.1109/TIP.2025.3550038","url":null,"abstract":"Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting with people and background objects irregularly, which adds complexity to the inpainting process. Recognizing the limitations posed by existing video wire datasets, which are characterized by their small size, poor quality, and limited variety of scenes, we introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video Dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks. WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models. Building upon this, our research proposes the Redundancy-Aware Transformer (Raformer) method that addresses the unique challenges of wire removal in video inpainting. Unlike conventional approaches that indiscriminately process all frame patches, Raformer employs a novel strategy to selectively bypass redundant parts, such as static background segments devoid of valuable information for inpainting. At the core of Raformer is the Redundancy-Aware Attention (RAA) module, which isolates and accentuates essential content through a coarse-grained, window-based attention mechanism. This is complemented by a Soft Feature Alignment (SFA) module, which refines these features and achieves end-to-end feature alignment. Extensive experiments on both the traditional video inpainting datasets and our proposed WRV2 dataset demonstrate that Raformer outperforms other state-of-the-art methods. Our codes and the WRV2 dataset will be made available at: <uri>https://github.com/Suyimu/WRV2</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1795-1809"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Label Auroral Image Classification Based on CNN and Transformer 基于CNN和Transformer的多标签极光图像分类

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI: 10.1109/TIP.2025.3550003

Hang Su;Qiuju Yang;Yixuan Ning;Zejun Hu;Lili Liu

{"title":"Multi-Label Auroral Image Classification Based on CNN and Transformer","authors":"Hang Su;Qiuju Yang;Yixuan Ning;Zejun Hu;Lili Liu","doi":"10.1109/TIP.2025.3550003","DOIUrl":"10.1109/TIP.2025.3550003","url":null,"abstract":"Auroral image classification has long been a focus of research in auroral physics. However, current methods for automatic auroral classification typically assume that only one type of aurora is present in an auroral image. This oversight neglects the complex transition states and coexistence of multiple types during the auroral evolution process, thus limiting the exploration of the intricate semantics of auroral images. To fully exploit the physical information embedded in auroral images, this paper proposes a multi-label auroral classification method, termed MLAC, which integrates convolutional neural network (CNN) and Transformer architectures. Firstly, we introduce a multi-scale feature fusion framework that enables the model to capture both fine-grained features and high-level information in auroral images, resulting in a more comprehensive representation of auroral features. Secondly, we propose a lightweight multi-head self-attention mechanism that captures long-range dependencies between pixels during the multiscale feature fusion process, which is crucial for distinguishing subtle differences between auroral types. Furthermore, we design a residual focused multilayer perceptron module that integrates large kernel depth-wise convolution with an improved multilayer perceptron. This integration enhances the model’s ability to represent complex spatial structure, thus improving local feature extraction and global contextual understanding. The proposed method achieves a mean average precision (mAP) of 88.20% on the auroral observation data collected at the Yellow River Station from 2003 to 2008. This performance significantly surpasses that of the most advanced multi-label classification models while maintaining competitive computational efficiency. Moreover, our method also outperforms the state-of-the-art multi-label methods in both computational efficiency and classification accuracy on two publicly available multi-label image datasets: WIDER-Attribute and VOC2007. These results demonstrate that our method skillfully leverages the robust feature extraction capability of CNNs for local features and the superior global information processing capability of Transformer.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1835-1848"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-Modal Causal Representation Learning for Radiology Report Generation 面向放射学报告生成的跨模态因果表示学习

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-16 DOI: 10.1109/TIP.2025.3568746

Weixing Chen;Yang Liu;Ce Wang;Jiarui Zhu;Guanbin Li;Cheng-Lin Liu;Liang Lin

{"title":"Cross-Modal Causal Representation Learning for Radiology Report Generation","authors":"Weixing Chen;Yang Liu;Ce Wang;Jiarui Zhu;Guanbin Li;Cheng-Lin Liu;Liang Lin","doi":"10.1109/TIP.2025.3568746","DOIUrl":"10.1109/TIP.2025.3568746","url":null,"abstract":"Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance, which can relieve the heavy burden of radiologists by automatically generating the corresponding radiology reports according to the given radiology image. However, generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases and inherent limitations of radiological imaging, such as low resolution and noise interference. To address these issues, we propose a two-stage framework named Cross-Modal Causal Representation Learning (CMCRL), consisting of the Radiological Cross-modal Alignment and Reconstruction Enhanced (RadCARE) pre-training and the Visual-Linguistic Causal Intervention (VLCI) fine-tuning. In the pre-training stage, RadCARE introduces a degradation-aware masked image restoration strategy tailored for radiological images, which reconstructs high-resolution patches from low-resolution inputs to mitigate noise and detail loss. Combined with a multiway architecture and four adaptive training strategies (e.g., text postfix generation with degraded images and text prefixes), RadCARE establishes robust cross-modal correlations even with incomplete data. In the VLCI phase, we deploy causal front-door intervention through two modules: the Visual Deconfounding Module (VDM) disentangles local-global features without fine-grained annotations, while the Linguistic Deconfounding Module (LDM) eliminates context bias without external terminology databases. Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods, with ablation studies confirming the necessity of both stages. Code and models are available at <uri>https://github.com/WissingChen/CMCRL</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2970-2985"},"PeriodicalIF":0.0,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Generalized Non-Convex Surrogated Framework for Anomaly Detection on Blurred Hyperspectral Images 模糊高光谱图像异常检测的广义非凸替代框架

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-16 DOI: 10.1109/TIP.2025.3568745

Yinjian Wang;Wei Li;Yuanyuan Gui;Haijun Xie;Lianbo Zhang

{"title":"A Generalized Non-Convex Surrogated Framework for Anomaly Detection on Blurred Hyperspectral Images","authors":"Yinjian Wang;Wei Li;Yuanyuan Gui;Haijun Xie;Lianbo Zhang","doi":"10.1109/TIP.2025.3568745","DOIUrl":"10.1109/TIP.2025.3568745","url":null,"abstract":"Hyperspectral imaging is endowed with outstanding discriminability between different land types by its comprehensive sensing of the spectrum, thus favored applying to anomaly detection. However, blurring effect, as a critical cause for quality deterioration of hyperspectral imaging, has been omitted by previous hyperspectral anomaly detection models. On one hand, given that anomalies are sparsely distributed in nature, such blurring effect entangling neighboring pixels severely weighs those detection models down. On the other hand, abnormal objects jeopardize the low-dimensional structure of the image, thus deblurring those images with anomalies is more challenging than normal ones. Hence, it is of much significance to investigate anomaly detection using blurred hyperspectral images. To this end, this paper proposes a generalized non-convex surrogated tensor framework that is able to perform anomaly detection robustly to blurring effects on hyperspectral images. The proposed framework is featured to be a unified paradigm which guarantees convergence for a broad class of non-convex surrogates. Through treating the spatial and spectral low-rankness adaptively via Block Term Decomposition, the unevenness in the multi-linear low-rankness of hyperspectral image is comprehensively considered, which together with the non-convex surrogates results in a tighter modeling of the low-dimensional prior of hyperspectral images. Extensive experiments demonstrate the superiority of the proposed method compared with the state-of-the-art methods on both hyperspectral image deblurring and anomaly detection.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3108-3122"},"PeriodicalIF":0.0,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0