Lang Chen;Yun Bian;Jianbin Zeng;Qingquan Meng;Weifang Zhu;Fei Shi;Chengwei Shao;Xinjian Chen;Dehui Xiang
{"title":"Style Consistency Unsupervised Domain Adaptation Medical Image Segmentation","authors":"Lang Chen;Yun Bian;Jianbin Zeng;Qingquan Meng;Weifang Zhu;Fei Shi;Chengwei Shao;Xinjian Chen;Dehui Xiang","doi":"10.1109/TIP.2024.3451934","DOIUrl":"10.1109/TIP.2024.3451934","url":null,"abstract":"Unsupervised domain adaptation medical image segmentation is aimed to segment unlabeled target domain images with labeled source domain images. However, different medical imaging modalities lead to large domain shift between their images, in which well-trained models from one imaging modality often fail to segment images from anothor imaging modality. In this paper, to mitigate domain shift between source domain and target domain, a style consistency unsupervised domain adaptation image segmentation method is proposed. First, a local phase-enhanced style fusion method is designed to mitigate domain shift and produce locally enhanced organs of interest. Second, a phase consistency discriminator is constructed to distinguish the phase consistency of domain-invariant features between source domain and target domain, so as to enhance the disentanglement of the domain-invariant and style encoders and removal of domain-specific features from the domain-invariant encoder. Third, a style consistency estimation method is proposed to obtain inconsistency maps from intermediate synthesized target domain images with different styles to measure the difficult regions, mitigate domain shift between synthesized target domain images and real target domain images, and improve the integrity of interested organs. Fourth, style consistency entropy is defined for target domain images to further improve the integrity of the interested organ by the concentration on the inconsistent regions. Comprehensive experiments have been performed with an in-house dataset and a publicly available dataset. The experimental results have demonstrated the superiority of our framework over state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4882-4895"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reference-Based Multi-Stage Progressive Restoration for Multi-Degraded Images","authors":"Yi Zhang;Qixue Yang;Damon M. Chandler;Xuanqin Mou","doi":"10.1109/TIP.2024.3451939","DOIUrl":"10.1109/TIP.2024.3451939","url":null,"abstract":"Image restoration (IR) via deep learning has been vigorously studied in recent years. However, due to the ill-posed nature of the problem, it is challenging to recover the high-quality image details from a single distorted input especially when images are corrupted by multiple distortions. In this paper, we propose a multi-stage IR approach for progressive restoration of multi-degraded images via transferring similar edges/textures from the reference image. Our method, called a Reference-based Image Restoration Transformer (Ref-IRT), operates via three main stages. In the first stage, a cascaded U-Transformer network is employed to perform the preliminary recovery of the image. The proposed network consists of two U-Transformer architectures connected by feature fusion of the encoders and decoders, and the residual image is estimated by each U-Transformer in an easy-to-hard and coarse-to-fine fashion to gradually recover the high-quality image. The second and third stages perform texture transfer from a reference image to the preliminarily-recovered target image to further enhance the restoration performance. To this end, a quality-degradation-restoration method is proposed for more accurate content/texture matching between the reference and target images, and a texture transfer/reconstruction network is employed to map the transferred features to the high-quality image. Experimental results tested on three benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art multi-degraded IR methods. Our code and dataset are available at \u0000<uri>https://vinelab.jp/refmdir/</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4982-4997"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection","authors":"Yujiang Pu;Xiaoyu Wu;Lulu Yang;Shengjin Wang","doi":"10.1109/TIP.2024.3451935","DOIUrl":"10.1109/TIP.2024.3451935","url":null,"abstract":"Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: \u0000<uri>https://github.com/yujiangpu20/PEL4VAD</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4923-4936"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural Relation Modeling of 3D Point Clouds","authors":"Yu Zheng;Jiwen Lu;Yueqi Duan;Jie Zhou","doi":"10.1109/TIP.2024.3451940","DOIUrl":"10.1109/TIP.2024.3451940","url":null,"abstract":"In this paper, we propose an effective plug-and-play module called structural relation network (SRN) to model structural dependencies in 3D point clouds for feature representation. Existing network architectures such as PointNet++ and RS-CNN capture local structures individually and ignore the inner interactions between different sub-clouds. Motivated by the fact that structural relation modeling plays critical roles for humans to understand 3D objects, our SRN exploits local information by modeling structural relations in 3D spaces. For a given sub-cloud of point sets, SRN firstly extracts its geometrical and locational relations with the other sub-clouds and maps them into the embedding space, then aggregates both relational features with the other sub-clouds. As the variation of semantics embedded in different sub-clouds is ignored by SRN, we further extend SRN to enable dynamic message passing between different sub-clouds. We propose a graph-based structural relation network (GSRN) where sub-clouds and their pairwise relations are modeled as nodes and edges respectively, so that the node features are updated by the messages along the edges. Since the node features might not be well preserved when acquiring the global representation, we propose a Combined Entropy Readout (CER) function to adaptively aggregate them into the holistic representation, so that GSRN simultaneously models the local-local and local-global region-wise interaction. The proposed SRN and GSRN modules are simple, interpretable, and do not require any additional supervision signals, which can be easily equipped with the existing networks. Experimental results on the benchmark datasets (ScanObjectNN, ModelNet40, ShapeNet Part, S3DIS, ScanNet and SUN-RGBD) indicate promising boosts on the tasks of 3D point cloud classification, segmentation and object detection.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4867-4881"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhang Zhou;Jiangchao Yao;Feng Hong;Ya Zhang;Yanfeng Wang
{"title":"Balanced Destruction-Reconstruction Dynamics for Memory-Replay Class Incremental Learning","authors":"Yuhang Zhou;Jiangchao Yao;Feng Hong;Ya Zhang;Yanfeng Wang","doi":"10.1109/TIP.2024.3451932","DOIUrl":"10.1109/TIP.2024.3451932","url":null,"abstract":"Class incremental learning (CIL) aims to incrementally update a trained model with the new classes of samples (plasticity) while retaining previously learned ability (stability). To address the most challenging issue in this goal, i.e., catastrophic forgetting, the mainstream paradigm is memory-replay CIL, which consolidates old knowledge by replaying a small number of old classes of samples saved in the memory. Despite effectiveness, the inherent destruction-reconstruction dynamics in memory-replay CIL are an intrinsic limitation: if the old knowledge is severely destructed, it will be quite hard to reconstruct the lossless counterpart. Our theoretical analysis shows that the destruction of old knowledge can be effectively alleviated by balancing the contribution of samples from the current phase and those saved in the memory. Motivated by this theoretical finding, we propose a novel Balanced Destruction-Reconstruction module (BDR) for memory-replay CIL, which can achieve better knowledge reconstruction by reducing the degree of maximal destruction of old knowledge. Specifically, to achieve a better balance between old knowledge and new classes, the proposed BDR module takes into account two factors: the variance in training status across different classes and the quantity imbalance of samples from the current phase and memory. By dynamically manipulating the gradient during training based on these factors, BDR can effectively alleviate knowledge destruction and improve knowledge reconstruction. Extensive experiments on a range of CIL benchmarks have shown that as a lightweight plug-and-play module, BDR can significantly improve the performance of existing state-of-the-art methods with good generalization. Our code is publicly available here.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4966-4981"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target Dataset","authors":"Huikai Shao;Pengxu Li;Dexing Zhong","doi":"10.1109/TIP.2024.3451933","DOIUrl":"10.1109/TIP.2024.3451933","url":null,"abstract":"As a promising topic in palmprint recognition, cross-dataset palmprint recognition is attracting more and more research interests. In this paper, a more difficult yet realistic scenario is studied, i.e., Single-Source Cross-Dataset Palmprint Recognition with Unseen Target dataset (S2CDPR-UT). It is aimed to generalize a palmprint feature extractor trained only on a single source dataset to multiple unseen target datasets collected by different devices or environments. To combat this challenge, we propose a novel method to improve the generalization of feature extractor for S2CDPR-UT, named Generating stylIzed FeaTures (GIFT). Firstly, the raw features are decoupled into high- and low- frequency components. Then, a feature stylization module is constructed to perturb the mean and variance of low-frequency components to generate more stylized features, which can provided more valuable knowledge. Furthermore, two diversity enhancement and consistency preservation supervisions are introduced at feature level to help to learn the model. The former is aimed to enhance the diversity of stylized features to expand the feature space. Meanwhile, the later is aimed to maintain the semantic consistency to ensure accurate palmprint recognition. Extensive experiments carried out on CASIA Multi-Spectral, XJTU-UP, and MPD palmprint databases show that our GIFT method can achieve significant improvement of performance over other methods. The codes will be released at \u0000<uri>https://github.com/HuikaiShao/GIFT</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4911-4922"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking Self-Training for Semi-Supervised Landmark Detection: A Selection-Free Approach","authors":"Haibo Jin;Haoxuan Che;Hao Chen","doi":"10.1109/TIP.2024.3451937","DOIUrl":"10.1109/TIP.2024.3451937","url":null,"abstract":"Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper threshold for sample selection as the localization task can be sensitive to noisy pseudo-labels; 3) coordinate regression does not output confidence, making selection-based self-training infeasible. To address the above issues, we propose Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection. Instead, STLD constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training. Pseudo pretraining and shrink regression are two essential components for such a curriculum, where the former is the first task of the curriculum for providing a better model initialization and the latter is further added in the later rounds to directly leverage the pseudo-labels in a coarse-to-fine manner. Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods consistently in both semi- and omni-supervised settings. The code is available at \u0000<uri>https://github.com/jhb86253817/STLD</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4952-4965"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zizheng Yan;Delian Ruan;Yushuang Wu;Junshi Huang;Zhenhua Chai;Xiaoguang Han;Shuguang Cui;Guanbin Li
{"title":"Contrastive Open-Set Active Learning-Based Sample Selection for Image Classification","authors":"Zizheng Yan;Delian Ruan;Yushuang Wu;Junshi Huang;Zhenhua Chai;Xiaoguang Han;Shuguang Cui;Guanbin Li","doi":"10.1109/TIP.2024.3451928","DOIUrl":"10.1109/TIP.2024.3451928","url":null,"abstract":"In this paper, we address a complex but practical scenario in Active Learning (AL) known as open-set AL, where the unlabeled data consists of both in-distribution (ID) and out-of-distribution (OOD) samples. Standard AL methods will fail in this scenario as OOD samples are highly likely to be regarded as uncertain samples, leading to their selection and wasting of the budget. Existing methods focus on selecting the highly likely ID samples, which tend to be easy and less informative. To this end, we introduce two criteria, namely contrastive confidence and historical divergence, which measure the possibility of being ID and the hardness of a sample, respectively. By balancing the two proposed criteria, highly informative ID samples can be selected as much as possible. Furthermore, unlike previous methods that require additional neural networks to detect the OOD samples, we propose a contrastive clustering framework that endows the classifier with the ability to identify the OOD samples and further enhances the network’s representation learning. The experimental results demonstrate that the proposed method achieves state-of-the-art performance on several benchmark datasets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5525-5537"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianshui Chen;Weihang Wang;Tao Pu;Jinghui Qin;Zhijing Yang;Jie Liu;Liang Lin
{"title":"Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration","authors":"Tianshui Chen;Weihang Wang;Tao Pu;Jinghui Qin;Zhijing Yang;Jie Liu;Liang Lin","doi":"10.1109/TIP.2024.3448248","DOIUrl":"10.1109/TIP.2024.3448248","url":null,"abstract":"Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4811-4823"},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juwei Guan;Xiaolin Fang;Tongxin Zhu;Zhipeng Cai;Zhen Ling;Ming Yang;Junzhou Luo
{"title":"IdeNet: Making Neural Network Identify Camouflaged Objects Like Creatures","authors":"Juwei Guan;Xiaolin Fang;Tongxin Zhu;Zhipeng Cai;Zhen Ling;Ming Yang;Junzhou Luo","doi":"10.1109/TIP.2024.3449574","DOIUrl":"10.1109/TIP.2024.3449574","url":null,"abstract":"Camouflaged objects often blend in with their surroundings, making the perception of a camouflaged object a more complex procedure. However, most neural-network-based methods that simulate the visual information processing pathway of creatures only roughly define the general process, which deficiently reproduces the process of identifying camouflaged objects. How to make modeled neural networks perceive camouflaged objects as effectively as creatures is a significant topic that deserves further consideration. After meticulous analysis of biological visual information processing, we propose an end-to-end prudent and comprehensive neural network, termed IdeNet, to model the critical information processing. Specifically, IdeNet divides the entire perception process into five stages: information collection, information augmentation, information filtering, information localization, and information correction and object identification. In addition, we design tailored visual information processing mechanisms for each stage, including the information augmentation module (IAM), the information filtering module (IFM), the information localization module (ILM), and the information correction module (ICM), to model the critical visual information processing and establish the inextricable association of biological behavior and visual information processing. The extensive experiments show that IdeNet outperforms state-of-the-art methods in all benchmarks, demonstrating the effectiveness of the five-stage partitioning of visual information processing pathway and the tailored visual information processing mechanisms for camouflaged object detection. Our code is publicly available at: \u0000<uri>https://github.com/whyandbecause/IdeNet</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4824-4839"},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}