Cheng Shi , Linfeng Lu , Minghua Zhao , Xinhong Hei , Chi-Man Pun , Qiguang Miao
{"title":"Learning hyperspectral noisy label with global and local hypergraph laplacian energy","authors":"Cheng Shi , Linfeng Lu , Minghua Zhao , Xinhong Hei , Chi-Man Pun , Qiguang Miao","doi":"10.1016/j.patcog.2025.111606","DOIUrl":"10.1016/j.patcog.2025.111606","url":null,"abstract":"<div><div>Deep learning has achieved significant advancements in hyperspectral image (HSI) classification, yet it is highly dependent on the availability of high-quality labeled data. However, acquiring such labeled data for HSIs is often challenging due to the associated high costs and complexity. Consequently, the issue of classifying HSIs with noisy labels has garnered increasing attention. To address the negative effects of noisy labels, various methods have employed label correction strategies and have demonstrated promising results. Nevertheless, these techniques typically rely on correcting labels based on small-loss samples or neighborhood similarity. In high-noise environments, such methods often face unstable training processes, and the unreliability of neighborhood samples restricts their effectiveness. To overcome these limitations, this paper proposes a label correction method designed to address noisy labels in HSI classification by leveraging both global and local hypergraph structures to estimate label confidence and correct mislabeled samples. In contrast to traditional graph-based approaches, hypergraphs are capable of capturing higher-order relationships among samples, thereby improving the accuracy of label correction. The proposed method minimizes both global and local hypergraph Laplacian energies to enhance label consistency and accuracy across the dataset. Furthermore, contrastive learning and the Mixup technique are integrated to bolster the robustness and discriminative capabilities of HSI classification networks. Extensive experiments conducted on four publicly available hyperspectral datasets — University of Pavia (UP), Salinas Valley (SV), Kennedy Space Center (KSC), and WHU-Hi-HanChuan (HC) — demonstrate the superior performance of the proposed method, particularly in scenarios characterized by high levels of noise, where substantial improvements in classification accuracy are observed.methods. The code is available at <span><span>https://github.com/AAAA-CS/GLHLE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111606"},"PeriodicalIF":7.5,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic accumulated attention map for interpreting evolution of decision-making in vision transformer","authors":"Yi Liao , Yongsheng Gao , Weichuan Zhang","doi":"10.1016/j.patcog.2025.111607","DOIUrl":"10.1016/j.patcog.2025.111607","url":null,"abstract":"<div><div>Various Vision Transformer (ViT) models have been widely used for image recognition tasks. However, existing visual explanation methods cannot display the attention flow hidden inside the inner structure of ViT models, which explains how the final attention regions are formed inside a ViT for its decision-making. In this paper, a novel visual explanation approach, Dynamic Accumulated Attention Map (DAAM), is proposed to provide a tool that can visualize, for the first time, the attention flow from the top to the bottom through ViT networks. To this end, a novel decomposition module is proposed to construct and store the spatial feature information by unlocking the [class] token generated by the self-attention module of each ViT block. The module can also obtain the channel importance coefficients by decomposing the classification score for supervised ViT models. Because of the lack of classification score in self-supervised ViT models, we propose dimension-wise importance weights to compute the channel importance coefficients. Such spatial features are linearly combined with the corresponding channel importance coefficients, forming the attention map for each block. The dynamic attention flow is revealed by block-wisely accumulating each attention map. The contribution of this work focuses on visualizing the evolution dynamic of the decision-making attention for any intermediate block inside a ViT model by proposing a novel decomposition module and dimension-wise importance weights. The quantitative and qualitative analysis consistently validate the effectiveness and superior capacity of the proposed DAAM for not only interpreting ViT models with the fully-connected (FC) layers as the classifier but also self-supervised ViT models. The code is available at <span><span>https://github.com/ly9802/DynamicAccumulatedAttentionMap</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111607"},"PeriodicalIF":7.5,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Zhang , Wenhui Zhou , Lili Lin , Andrew Lumsdaine
{"title":"Cascade residual learning based adaptive feature aggregation for light field super-resolution","authors":"Hao Zhang , Wenhui Zhou , Lili Lin , Andrew Lumsdaine","doi":"10.1016/j.patcog.2025.111616","DOIUrl":"10.1016/j.patcog.2025.111616","url":null,"abstract":"<div><div>Light field (LF) super-resolution aims to enhance the spatial or angular resolutions of LF images. Most existing methods tend to decompose 4D LF images into multiple 2D subspaces such as spatial, angular, and epipolar plane image (EPI) domains, and devote efforts to designing various feature extractors for each subspace domain. However, it remains challenging to select an effective multi-domain feature fusion strategy, including the fusion order and structure. To this end, this paper proposes an adaptive feature aggregation framework based on cascade residual learning, which can adaptively select feature aggregation strategies through learning rather than designed artificially. Specifically, we first employ three types of 2D feature extractors for spatial, angular, and EPI feature extraction, respectively. Then, an adaptive feature aggregation (AFA) module is designed to cascade these feature extractors through multi-level residual connections. This design enables the network to flexibly aggregate various subspace features without introducing additional parameters. We conduct comprehensive experiments on both real-world and synthetic LF datasets for light field spatial super-resolution (LFSSR) and light field angular super-resolution (LFASR). Quantitative and visual comparisons demonstrate that our model achieves state-of-the-art super-resolution (SR) performance. The code is available at <span><span>https://github.com/haozhang25/AFA-LFSR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111616"},"PeriodicalIF":7.5,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifan Wang , Gerald Schaefer , Xiyao Liu , Jing Dong , Linglin Jing , Ye Wei , Xianghua Xie , Hui Fang
{"title":"Class activation map guided level sets for weakly supervised semantic segmentation","authors":"Yifan Wang , Gerald Schaefer , Xiyao Liu , Jing Dong , Linglin Jing , Ye Wei , Xianghua Xie , Hui Fang","doi":"10.1016/j.patcog.2025.111566","DOIUrl":"10.1016/j.patcog.2025.111566","url":null,"abstract":"<div><div>Weakly supervised semantic segmentation (WSSS) aims to achieve pixel-level fine-grained image segmentation using only weak guidance such as image-level class labels, thus significantly decreasing annotation costs. Despite the impressive performance showcased by current state-of-the-art WSSS approaches, the lack of precise object localisation limits their segmentation accuracy, especially for pixels close to object boundaries. To address this issue, we propose a novel class activation map (CAM)-based level set method to effectively improve the quality of pseudo-labels by exploring the capability of level sets to enhance the segmentation accuracy at object boundaries. To speed up the level set evolution process, we use Fourier neural operators to simulate the dynamic evolution of our level set method. Extensive experimental results show that our approach significantly outperforms existing WSSS methods on both PASCAL VOC 2012 and MS COCO datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111566"},"PeriodicalIF":7.5,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143768430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wan Xu , Tianyu Huang , Tianyuan Qu , Guanglei Yang , Yiwen Guo , Wangmeng Zuo
{"title":"FILP-3D: Enhancing 3D few-shot class-incremental learning with pre-trained vision-language models","authors":"Wan Xu , Tianyu Huang , Tianyuan Qu , Guanglei Yang , Yiwen Guo , Wangmeng Zuo","doi":"10.1016/j.patcog.2025.111558","DOIUrl":"10.1016/j.patcog.2025.111558","url":null,"abstract":"<div><div>Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. However, many of these works lack effective exploration of prior knowledge, rendering them unable to effectively address the domain gap issue in the context of 3D FSCIL, thereby leading to catastrophic forgetting. The Contrastive Vision-Language Pre-Training (CLIP) model serves as a highly suitable backbone for addressing the challenges of 3D FSCIL due to its abundant shape-related prior knowledge. Unfortunately, its direct application to 3D FSCIL still faces the incompatibility between 3D data representation and the 2D features, primarily manifested as feature space misalignment and significant noise. To address the above challenges, we introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise. RFE aligns the feature spaces of input point clouds and their embeddings by performing a unique dimensionality reduction on the feature space of pre-trained models (PTMs), effectively eliminating redundant information without compromising semantic integrity. On the other hand, SNC is a graph-based 3D model designed to capture robust geometric information within point clouds, thereby augmenting the knowledge lost due to projection, particularly when processing real-world scanned data. Moreover, traditional accuracy metrics are proven to be biased due to the imbalance in existing 3D datasets. Therefore we propose 3D FSCIL benchmark FSCIL3D-XL and novel evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model. Experimental results on both established and our proposed benchmarks demonstrate that our approach significantly outperforms existing state-of-the-art methods. Code is available at: <span><span>https://github.com/HIT-leaderone/FILP-3D</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111558"},"PeriodicalIF":7.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143714776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Yang, Xinqi Liu, Fumin Ma, Xiaojian Ding, Kaixiang Wang
{"title":"Online Asymmetric Supervised Discrete Cross-Modal Hashing for Streaming Multimedia Data","authors":"Fan Yang, Xinqi Liu, Fumin Ma, Xiaojian Ding, Kaixiang Wang","doi":"10.1016/j.patcog.2025.111604","DOIUrl":"10.1016/j.patcog.2025.111604","url":null,"abstract":"<div><div>Cross-modal online hashing, which uses freshly received data to retrain the hash function gradually, has become a research hotspot as a means of handling the massive amounts of streaming data that have been brought about by the fast growth of multimedia technology and the popularity of portable devices. However, in the process of processing stream data in most methods, on the one hand, the relationship between modal classes and the common features between label vectors and binary codes is not fully explored. On the other hand, the semantic information in the old and new data modes is not fully utilized. In this post, we offer Online Asymmetric Supervised Discrete Cross-Modal Hashing for Streaming Multimedia Data (OASCH) as a solution. This study integrates the concept cognition mechanism of dynamic incremental samples and an asymmetric knowledge guidance mechanism into the online hash learning framework. The proposed algorithmic model takes into account the knowledge similarity between newly arriving data and the existing dataset, as well as the knowledge similarity within the new data itself. It projects the hash codes associated with new incoming sample data into the potential space of concept cognition. By doing so, the model maximizes the mining of implicit semantic similarities within streaming data across different time points, resulting in the generation of compact hash codes with enhanced discriminative power, we further propose an adaptive edge regression strategy. Our method surpasses several current sophisticated cross-modal hashing techniques regarding both retrieval efficiency and search accuracy, according to studies on three publicly available multimedia retrieval datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111604"},"PeriodicalIF":7.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Li , Guanci Yang , Zhidong Su , Shaobo Li , Jing Yang , Ling He
{"title":"Indoor scene multi-object tracking based on region search and memory buffer pool","authors":"Yang Li , Guanci Yang , Zhidong Su , Shaobo Li , Jing Yang , Ling He","doi":"10.1016/j.patcog.2025.111623","DOIUrl":"10.1016/j.patcog.2025.111623","url":null,"abstract":"<div><div>This study proposes a new Indoor Scene Multi-Object Tracking (IS-MOT) task to complete multi-granularity parsing and continuously track indoor human objects. To foster the IS-MOT task, we refer to the basic human movement composition, combining indoor human motion characteristics, constructing a large-scale multi-object tracking benchmark for indoor social robot perspective, termed Multi-Resident Tracking (MRT). To address the issue of insufficient persistent tracking capability when extending existing MOT methods to the IS-MOT task. A persistent visual multi-object tracking method based on region search and memory buffer pool (PeViTrack) is designed. PeViTrack is mainly composed of a Homogeneous Semantic Memory Buffer Pool (HSMBP) that integrates a Motion State Estimation Module (MSEM) and a Hierarchical Matching Correlation Mechanism (HMCM). HSMBP allows the network to construct an allocation representation based on high and low confidence detection boxes, thereby establishing homogeneous and heterogeneous semantic embedding decision spaces in the spatial domain, thus forcing the network to search and accurately associate object homogeneous and heterogeneous features efficiently. Extensive experiments on the constructed MRT and the well-recognized DanceTrack dataset show that PeViTrack achieves state-of-the-art tracking performance. The code and datasets will be made available at <span><span>https://github.com/funweb/PeViTrack</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111623"},"PeriodicalIF":7.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rank-revealing fully-connected tensor network decomposition and its application to tensor completion","authors":"Yun-Yang Liu , Xi-Le Zhao , Gemine Vivone","doi":"10.1016/j.patcog.2025.111610","DOIUrl":"10.1016/j.patcog.2025.111610","url":null,"abstract":"<div><div>Fully-connected tensor network (FCTN) decomposition has become a powerful tool for handling high-dimensional data. However, for a given <span><math><mi>N</mi></math></span>th-order data, <span><math><mrow><mi>N</mi><mrow><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow><mo>/</mo><mn>2</mn></mrow></math></span> tuning parameters (i.e., FCTN rank) in FCTN decomposition is a tricky challenge, which hinders its wide deployments. Although many recent works have emerged to adaptively search for a (near)-optimal FCTN rank, these methods suffer from expensive computational costs since they require too many search and evaluation processes, significantly limiting their applications to high-dimensional data. To tackle the above challenges, we develop a rank-revealing FCTN (revealFCTN) decomposition, whose FCTN rank is adaptively and efficiently inferred. More specifically, by analyzing the sizes of the sub-network tensors in the FCTN decomposition, we establish the equivalent relationships between the FCTN rank and the ranks of single-mode and double-mode unfolding matrices of the given data. The FCTN rank can be directly revealed through the ranks of these unfolding matrices, which does not require any search and evaluation process, making the computational cost almost negligible compared to the search-based methods. To evaluate the performance of the developed revealFCTN decomposition, we test its performance on a representative task: tensor completion (TC). Comprehensive experimental results demonstrate that our method outperforms several state-of-the-art methods, achieving a MPSNR gain of around 1 dB in most cases compared to the original FCTN decomposition.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111610"},"PeriodicalIF":7.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lintao Zhang , Jinjian Wu , Lihong Wang , Li Wang , David C. Steffens , Shijun Qiu , Guy G. Potter , Mingxia Liu
{"title":"Brain anatomy prior modeling to forecast clinical progression of cognitive impairment with structural MRI","authors":"Lintao Zhang , Jinjian Wu , Lihong Wang , Li Wang , David C. Steffens , Shijun Qiu , Guy G. Potter , Mingxia Liu","doi":"10.1016/j.patcog.2025.111603","DOIUrl":"10.1016/j.patcog.2025.111603","url":null,"abstract":"<div><div>Brain structural MRI has been widely used to assess the future progression of cognitive impairment (CI). Previous learning-based studies usually suffer from the issue of small-sized labeled training data, while a huge amount of structural MRIs exist in large-scale public databases. Intuitively, brain anatomical structures derived from these public MRIs (even without task-specific label information) can boost CI progression trajectory prediction. However, previous studies seldom use such brain anatomy structure information as priors. To this end, this paper proposes a brain anatomy prior modeling (BAPM) framework to forecast the clinical progression of cognitive impairment with small-sized target MRIs by exploring anatomical brain structures. Specifically, the BAPM consists of a <em>pretext model</em> and a <em>downstream model</em>, with a shared brain anatomy-guided encoder to model brain anatomy prior using auxiliary tasks explicitly. Besides the encoder, the pretext model also contains two decoders for two auxiliary tasks (<em>i.e.</em>, MRI reconstruction and brain tissue segmentation), while the downstream model relies on a predictor for classification. The brain anatomy-guided encoder is pre-trained with the pretext model on 9,344 auxiliary MRIs without diagnostic labels for anatomy prior modeling. With this encoder frozen, the downstream model is then fine-tuned on limited target MRIs for prediction. We validate BAPM on two CI-related studies with T1-weighted MRIs from 448 subjects. Experimental results suggest the effectiveness of BAPM in (1) four CI progression prediction tasks, (2) MR image reconstruction, and (3) brain tissue segmentation, compared with several state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111603"},"PeriodicalIF":7.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptively robust high-order tensor factorization for low-rank tensor reconstruction","authors":"Zihao Song , Yongyong Chen , Zhao Weihua","doi":"10.1016/j.patcog.2025.111600","DOIUrl":"10.1016/j.patcog.2025.111600","url":null,"abstract":"<div><div>Recently, various approaches have been proposed for tensor reconstruction from incomplete and contaminated data. However, most algorithms focus on third-order tensors, neglecting higher-order tensors that are common in real-world applications. Additionally, many studies use LASSO-type penalties or second-order statistics to capture noise patterns, which may not perform well with dense and gross outliers. To address these challenges, we propose a novel robust high-order tensor recovery model that simultaneously removes complex noise and completes missing entries. We introduce a factor Frobenius norm for the low-rank structures of high-order tensors and derive a nonconvex function via the <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> criterion. An estimation algorithm is developed using the alternating minimization method. Our method jointly estimates tensor terms of interest and precision parameters, adapting to noise patterns for data-driven robustness. We analyze the convergence properties of our algorithm, and numerical experiments validate its superiority in natural image reconstruction, video restoration, and background modeling compared to state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111600"},"PeriodicalIF":7.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}