Pattern Recognition最新文献

筛选
英文 中文
Leveraging multi-level regularization for efficient Domain Adaptation of Black-box Predictors
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-28 DOI: 10.1016/j.patcog.2025.111611
Wei Li , Wenyi Zhao , Xipeng Pan , Pengcheng Zhou , Huihua Yang
{"title":"Leveraging multi-level regularization for efficient Domain Adaptation of Black-box Predictors","authors":"Wei Li ,&nbsp;Wenyi Zhao ,&nbsp;Xipeng Pan ,&nbsp;Pengcheng Zhou ,&nbsp;Huihua Yang","doi":"10.1016/j.patcog.2025.111611","DOIUrl":"10.1016/j.patcog.2025.111611","url":null,"abstract":"<div><div>Source-free domain adaptation (SFDA) aims to adapt a source-trained model to a target domain without exposing the source data, addressing concerns about data privacy and security. Nevertheless, this paradigm is still susceptible to data leakage due to potential adversarial attacks on the source model. Domain adaptation of black-box predictors (DABP) offers an alternative approach that does not require access to both the source-domain data and the predictor parameters. Existing DABP methods, however, have several significant drawbacks: (1) Lightweight models may underperform due to limited learning capacity. (2) The potential of the target data is not fully harness to learn the structure of the target domain. (3) Focusing exclusively on input-level or network-level regularization renders feature representations susceptible to noisy pseudo labels, degrading performance. Aiming at these limitations, we introduce a novel approach referred to as <strong>M</strong>ulti-<strong>L</strong>evel <strong>R</strong>egularization (MLR) for efficient black-box domain adaptation from network-level, input-level, and feature-level. Our MLR framework comprises a teacher–student network that allows peer networks to utilize pseudo labels generated by each other for supplementary guidance, thereby learning diverse target representations and alleviating overfitting on the source domain. At the input level, we integrate both local and global interpolation consistency training strategies to capture the inherent structure of the target data. Furthermore, by leveraging input-level and network-level regularizations, we propose a mutual contrastive learning strategy that constructs positive pairs from various network architectures and data augmentations to enhance representation learning. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks with lightweight models, even outperforming many white-box SFDA methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111611"},"PeriodicalIF":7.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cycle-VQA: A Cycle-Consistent Framework for Robust Medical Visual Question Answering
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-28 DOI: 10.1016/j.patcog.2025.111609
Lin Fan , Xun Gong , Cenyang Zheng , Xuli Tan , Jiao Li , Yafei Ou
{"title":"Cycle-VQA: A Cycle-Consistent Framework for Robust Medical Visual Question Answering","authors":"Lin Fan ,&nbsp;Xun Gong ,&nbsp;Cenyang Zheng ,&nbsp;Xuli Tan ,&nbsp;Jiao Li ,&nbsp;Yafei Ou","doi":"10.1016/j.patcog.2025.111609","DOIUrl":"10.1016/j.patcog.2025.111609","url":null,"abstract":"<div><div>Medical Visual Question Answering (Med-VQA) presents greater challenges than traditional Visual Question Answering (VQA) due to the diversity of clinical questions and the complexity of visual reasoning. To address these challenges, we propose Cycle-VQA, a unified framework designed to enhance the reliability and robustness of Med-VQA systems. The framework leverages cycle consistency to establish bidirectional information flow among questions, answers, and visual features, strengthening reasoning stability and ensuring accurate feature integration. Inspired by clinical diagnostic processes, Cycle-VQA incorporates key pathological attributes and introduces a novel multi-modal attribute cross-fusion strategy designed to effectively capture shared and unique features across modalities. Experimental results on Gastrointestinal Stromal Tumors (GISTs) and public Med-VQA datasets diagnosis validate the effectiveness of Cycle-VQA, demonstrating its potential to advance medical image analysis and support reliable clinical decision-making.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111609"},"PeriodicalIF":7.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143768431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FN-NET: Adaptive data augmentation network for fine-grained visual categorization
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-28 DOI: 10.1016/j.patcog.2025.111618
Shuo Ye , Qinmu Peng , Yiu-ming Cheung , Yu Wang , Ziqian Zou , Xinge You
{"title":"FN-NET: Adaptive data augmentation network for fine-grained visual categorization","authors":"Shuo Ye ,&nbsp;Qinmu Peng ,&nbsp;Yiu-ming Cheung ,&nbsp;Yu Wang ,&nbsp;Ziqian Zou ,&nbsp;Xinge You","doi":"10.1016/j.patcog.2025.111618","DOIUrl":"10.1016/j.patcog.2025.111618","url":null,"abstract":"<div><div>Data augmentation significantly contributes to enhancing model performance, robustness, and generalization ability. However, existing methods struggle when applied directly to fine-grained targets. Particularly during perspective changes, significant details carried by local regions may be obscured or altered, making data augmentation at this point prone to severe overfitting. We argue that subclasses have common discriminative features, and these features exhibit a certain degree of complementarity. Therefore, in this paper, we propose a novel data augmentation framework for fine-grained targets called the feature expansion and noise fusion network (FN-Net). Specifically, a lightweight branch (aug-branch) is introduced in the middle layer of the convolutional neural network. Feature expansion is involved in this branch, which creates new semantic combinations from multiple instances by exchanging discriminative regions within the same subclass in the feature space. Noise fusion preserves the noise distribution of the current subclass, enhancing the model’s robustness and improving its understanding of instances in real-world environment. Additionally, to prevent potential disruptions to the original feature combinations caused by the feature expansion process, distillation loss is employed to facilitate the learning process of the aug-branch. We evaluate FN-Net on three FGVC benchmark datasets. The experimental results demonstrate that our method consistently outperforms the state-of-the-art approaches on different depths and types of network backbone structures.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111618"},"PeriodicalIF":7.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143768426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated design of neural networks with multi-scale convolutions via multi-path weight sampling 通过多路径权重采样自动设计多尺度卷积神经网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-27 DOI: 10.1016/j.patcog.2025.111605
Junhao Huang , Bing Xue , Yanan Sun , Mengjie Zhang , Gary G. Yen
{"title":"Automated design of neural networks with multi-scale convolutions via multi-path weight sampling","authors":"Junhao Huang ,&nbsp;Bing Xue ,&nbsp;Yanan Sun ,&nbsp;Mengjie Zhang ,&nbsp;Gary G. Yen","doi":"10.1016/j.patcog.2025.111605","DOIUrl":"10.1016/j.patcog.2025.111605","url":null,"abstract":"<div><div>The performance of convolutional neural networks (CNNs) relies heavily on the architecture design. Recently, an increasingly prevalent trend in CNN architecture design is the utilization of ingeniously crafted building blocks, e.g., the MixConv module, for improving the model expressivity and efficiency. To leverage the feature learning capability of multi-scale convolution while further reducing its computational complexity, this paper presents a computationally efficient yet powerful module, dubbed EMixConv, by combining parameter-free concatenation-based feature reuse with multi-scale convolution. In addition, we propose a one-shot neural architecture search (NAS) method integrating the EMixConv module to automatically search for the optimal combination of the related architectural parameters. Furthermore, an efficient multi-path weight sampling mechanism is developed to enhance the robustness of weight inheritance in the supernet. We demonstrate the effectiveness of the proposed module and the NAS algorithm on three popular image classification tasks. The developed models, dubbed EMixNets, outperform most state-of-the-art architectures with fewer parameters and computations on the CIFAR datasets. On ImageNet, EMixNet is superior to a majority of compared methods and is also more compact and computationally efficient.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111605"},"PeriodicalIF":7.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniHDSA: A unified relation prediction approach for hierarchical document structure analysis
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-27 DOI: 10.1016/j.patcog.2025.111617
Jiawei Wang , Kai Hu , Qiang Huo
{"title":"UniHDSA: A unified relation prediction approach for hierarchical document structure analysis","authors":"Jiawei Wang ,&nbsp;Kai Hu ,&nbsp;Qiang Huo","doi":"10.1016/j.patcog.2025.111617","DOIUrl":"10.1016/j.patcog.2025.111617","url":null,"abstract":"<div><div>Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to restore the hierarchical structure of documents created using authoring software with hierarchical schemas. Previous research has primarily followed two approaches: one focuses on tackling specific subtasks of HDSA in isolation, such as table detection or reading order prediction, while the other adopts a unified framework that uses multiple branches or modules, each designed to address a distinct task. In this work, we propose a unified relation prediction approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. This allows a single relation prediction module to handle multiple tasks simultaneously, whether at a page-level or document-level structure analysis. By doing so, our approach significantly reduces the risk of cascading errors and enhances system’s efficiency, scalability, and adaptability. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance on a hierarchical document structure analysis benchmark, Comp-HRDoc, and competitive results on a large-scale document layout analysis dataset, DocLayNet, effectively illustrating the superiority of our method across all sub-tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111617"},"PeriodicalIF":7.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video summarization with temporal-channel visual transformer
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-27 DOI: 10.1016/j.patcog.2025.111631
Xiaoyan Tian , Ye Jin , Zhao Zhang , Peng Liu , Xianglong Tang
{"title":"Video summarization with temporal-channel visual transformer","authors":"Xiaoyan Tian ,&nbsp;Ye Jin ,&nbsp;Zhao Zhang ,&nbsp;Peng Liu ,&nbsp;Xianglong Tang","doi":"10.1016/j.patcog.2025.111631","DOIUrl":"10.1016/j.patcog.2025.111631","url":null,"abstract":"<div><div>Video summarization task has gained widespread interest, benefiting from its valuable capabilities for efficient video browsing. Existing approaches generally focus on inter-frame temporal correlations, which may not be sufficient to identify crucial content because of the limited useful details that can be gleaned. To resolve these issues, we propose a novel transformer-based approach for video summarization, called Temporal-Channel Visual Transformer (TCVT). The proposed TCVT consists of three components, including a dual-stream embedding module, an inter-frame encoder, and an intra-segment encoder. The dual-stream embedding module creates the fusion embedding sequence by extracting visual features and short-range optical features, preserving appearance and motion details. The temporal-channel inter-frame correlations are learned by the inter-frame encoder with multiple temporal and channel attention modules. Meanwhile, the intra-segment representations are captured by the intra-segment encoder for the local temporal context modeling. Finally, we fuse the frame-level and segment-level representations for the frame-wise importance score prediction. Our network outperforms state-of-the-art methods on two benchmark datasets, with improvements from 55.3% to 56.9% on the SumMe dataset and from 69.3% to 70.4% on the TVSum dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111631"},"PeriodicalIF":7.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143735020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual fidelity and full-scale interaction driven network for infrared and visible image fusion
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-26 DOI: 10.1016/j.patcog.2025.111612
Liye Mei , Xinglong Hu , Zhaoyi Ye , Zhiwei Ye , Chuan Xu , Sheng Liu , Cheng Lei
{"title":"Visual fidelity and full-scale interaction driven network for infrared and visible image fusion","authors":"Liye Mei ,&nbsp;Xinglong Hu ,&nbsp;Zhaoyi Ye ,&nbsp;Zhiwei Ye ,&nbsp;Chuan Xu ,&nbsp;Sheng Liu ,&nbsp;Cheng Lei","doi":"10.1016/j.patcog.2025.111612","DOIUrl":"10.1016/j.patcog.2025.111612","url":null,"abstract":"<div><div>The objective of infrared and visible image fusion is to combine the unique strengths of source images into a single image that serves human visual perception and machine detection. The existing fusion networks are still lacking in the effective characterization and retention of source image features. To counter these deficiencies, we propose a visual fidelity and full-scale interaction driven network for infrared and visible image fusion, named VFFusion. First, a multi-scale feature encoder based on BiFormer is constructed, and a feature cascade interaction module is designed to perform full-scale interaction on features distributed across different scales. In addition, a visual fidelity branch is built to process multi-scale features in parallel with the fusion branch. Specifically, the visual fidelity branch uses blurred images for self-supervised training in the constructed auxiliary task, thereby obtaining an effective representation of the source image information. By exploring the complementary representational features of infrared and visible images as supervisory information, it constrains the fusion branch to retain the source image features in the fused image. Notably, the visual fidelity branch employs a multi-scale joint reconstruction loss, utilizing the rich supervisory signals provided by multi-scale original images to enhance the feature representation of targets at different scales, resulting in clear fusion of the targets. Extensive qualitative and quantitative comparative experiments are conducted on four datasets against nine advanced methods, demonstrating the superiority of our approach. The source code is available at <span><span>https://github.com/XingLongH/VFFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111612"},"PeriodicalIF":7.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complementary label learning with multi-view data and a semi-supervised labeling mechanism 利用多视角数据和半监督标签机制进行互补标签学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-26 DOI: 10.1016/j.patcog.2025.111651
Long Tang , Yelei Liu , Yingjie Tian , Panos M Pardalos
{"title":"Complementary label learning with multi-view data and a semi-supervised labeling mechanism","authors":"Long Tang ,&nbsp;Yelei Liu ,&nbsp;Yingjie Tian ,&nbsp;Panos M Pardalos","doi":"10.1016/j.patcog.2025.111651","DOIUrl":"10.1016/j.patcog.2025.111651","url":null,"abstract":"<div><div>Rooted in a form of inexact supervision, complementary label learning (CLL) relieves the labeling burden of attaining definite categories for numerous training samples by depicting each of them through one or several incorrect categories. Although existing approaches adopt diverse network structures, learning paradigms and loss functions to facilitate CLL, developing a dependable classifier with the provided complementary labels remains challenging. To this end, a novel <strong>CLL</strong> method integrated with <strong>m</strong>ulti-<strong>v</strong>iew fusion and a <strong>s</strong>emi-<strong>s</strong>upervised labeling mechanism, called MVSSCLL, is proposed in this work. MVSSCLL is able to learn adaptively the label distribution of the training samples by leveraging a semi-supervised labeling mechanism. Simultaneously, a multi-view feature fusion approach following the consensus and complementary principles is also embedded. Such integration helps enhance the extraction of valuable information from multi-view feature data with complementary labels. Experimentally, MVSSCLL surpasses significantly the state-of-the-art methods. The maximum accuracy advantage over the second-best method reaches 43.11 %. The advancements made by MVSSCLL have greatly improved the performance of CLL without increasing labeling costs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111651"},"PeriodicalIF":7.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143824710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bidirectional trained tree-structured decoder for Handwritten Mathematical Expression Recognition
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-25 DOI: 10.1016/j.patcog.2025.111599
Hanbo Cheng , Chenyu Liu , Pengfei Hu , Zhenrong Zhang , Jiefeng Ma , Jun Du
{"title":"Bidirectional trained tree-structured decoder for Handwritten Mathematical Expression Recognition","authors":"Hanbo Cheng ,&nbsp;Chenyu Liu ,&nbsp;Pengfei Hu ,&nbsp;Zhenrong Zhang ,&nbsp;Jiefeng Ma ,&nbsp;Jun Du","doi":"10.1016/j.patcog.2025.111599","DOIUrl":"10.1016/j.patcog.2025.111599","url":null,"abstract":"<div><div>The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of Optical Character Recognition (OCR). Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional training methods are primarily designed for string decoders and cannot adequately generalize to tree decoders, which offer superior generalization capabilities and structural analysis capacity. To overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method extends the bidirectional training strategy to the tree decoder, enabling more effective training by leveraging bidirectional information. Additionally, we analyze the impact of the visual and linguistic perception of the HMER model separately and introduce the Shared Language Modeling (SLM) mechanism. Through the SLM, we enhance the model’s robustness and generalization when dealing with visual ambiguity, especially in scenarios with abundant training data. Our approach has been validated through extensive experiments, demonstrating its ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and 2019 datasets, as well as the HME100K dataset. The code used in our experiments will be publicly available at <span><span>https://github.com/Hanbo-Cheng/BAT.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111599"},"PeriodicalIF":7.5,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning hyperspectral noisy label with global and local hypergraph laplacian energy
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-24 DOI: 10.1016/j.patcog.2025.111606
Cheng Shi , Linfeng Lu , Minghua Zhao , Xinhong Hei , Chi-Man Pun , Qiguang Miao
{"title":"Learning hyperspectral noisy label with global and local hypergraph laplacian energy","authors":"Cheng Shi ,&nbsp;Linfeng Lu ,&nbsp;Minghua Zhao ,&nbsp;Xinhong Hei ,&nbsp;Chi-Man Pun ,&nbsp;Qiguang Miao","doi":"10.1016/j.patcog.2025.111606","DOIUrl":"10.1016/j.patcog.2025.111606","url":null,"abstract":"<div><div>Deep learning has achieved significant advancements in hyperspectral image (HSI) classification, yet it is highly dependent on the availability of high-quality labeled data. However, acquiring such labeled data for HSIs is often challenging due to the associated high costs and complexity. Consequently, the issue of classifying HSIs with noisy labels has garnered increasing attention. To address the negative effects of noisy labels, various methods have employed label correction strategies and have demonstrated promising results. Nevertheless, these techniques typically rely on correcting labels based on small-loss samples or neighborhood similarity. In high-noise environments, such methods often face unstable training processes, and the unreliability of neighborhood samples restricts their effectiveness. To overcome these limitations, this paper proposes a label correction method designed to address noisy labels in HSI classification by leveraging both global and local hypergraph structures to estimate label confidence and correct mislabeled samples. In contrast to traditional graph-based approaches, hypergraphs are capable of capturing higher-order relationships among samples, thereby improving the accuracy of label correction. The proposed method minimizes both global and local hypergraph Laplacian energies to enhance label consistency and accuracy across the dataset. Furthermore, contrastive learning and the Mixup technique are integrated to bolster the robustness and discriminative capabilities of HSI classification networks. Extensive experiments conducted on four publicly available hyperspectral datasets — University of Pavia (UP), Salinas Valley (SV), Kennedy Space Center (KSC), and WHU-Hi-HanChuan (HC) — demonstrate the superior performance of the proposed method, particularly in scenarios characterized by high levels of noise, where substantial improvements in classification accuracy are observed.methods. The code is available at <span><span>https://github.com/AAAA-CS/GLHLE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111606"},"PeriodicalIF":7.5,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信