{"title":"A robust transductive distribution calibration method for few-shot learning","authors":"Jingcong Li, Chunjin Ye, Fei Wang, Jiahui Pan","doi":"10.1016/j.patcog.2025.111488","DOIUrl":"10.1016/j.patcog.2025.111488","url":null,"abstract":"<div><div>Few-shot learning (FSL) has gained much attention and has recently made substantial progress. To alleviate the data constraints in FSL, previous studies have attempted to generate features by learning a feature distribution. However, the learned distribution is biased and unstable due to limited labeled data, and the features from it can be even more biased, which decreases its generalizability. This paper proposes a Robust Transductive Distribution Calibration (RTDC) method to estimate feature distributions of few-shot classes in a more accurate and robust way. First, we capture the underlying distribution information by precisely estimating the covariance matrix of each novel category. Second, we consider the distribution similarity between labeled and unlabeled samples using the estimated covariance matrix and then optimize the feature distribution in a transductive manner. Extensive experiments demonstrated the effectiveness and significance of our method on several FSL benchmarks, including <em>mini</em>ImageNet, <em>tiered</em>ImageNet, CUB, and CIFAR-FS.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111488"},"PeriodicalIF":7.5,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust shortcut and disordered robustness: Improving adversarial training through adaptive smoothing","authors":"Lin Li , Michael Spratling","doi":"10.1016/j.patcog.2025.111474","DOIUrl":"10.1016/j.patcog.2025.111474","url":null,"abstract":"<div><div>Deep neural networks are highly susceptible to adversarial perturbations: artificial noise that corrupts input data in ways imperceptible to humans but causes incorrect predictions. Among the various defenses against these attacks, adversarial training has emerged as the most effective. In this work, we aim to enhance adversarial training to improve robustness against adversarial attacks. We begin by analyzing how adversarial vulnerability evolves during training from an instance-wise perspective. This analysis reveals two previously unrecognized phenomena: <em>robust shortcut</em> and <em>disordered robustness</em>. We then demonstrate that these phenomena are related to <em>robust overfitting</em>, a well-known issue in adversarial training. Building on these insights, we propose a novel adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). This method jointly smooths the input and weight loss landscapes in an instance-adaptive manner, preventing the exploitation of robust shortcut and thereby mitigating robust overfitting. Extensive experiments demonstrate the efficacy of ISEAT and its superiority over existing adversarial training methods. Code is available at <span><span>https://github.com/TreeLLi/ISEAT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111474"},"PeriodicalIF":7.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueyuan Xu , Fulin Wei , Tianze Yu , Jinxin Lu , Aomei Liu , Li Zhuo , Feiping Nie , Xia Wu
{"title":"Embedded multi-label feature selection via orthogonal regression","authors":"Xueyuan Xu , Fulin Wei , Tianze Yu , Jinxin Lu , Aomei Liu , Li Zhuo , Feiping Nie , Xia Wu","doi":"10.1016/j.patcog.2025.111477","DOIUrl":"10.1016/j.patcog.2025.111477","url":null,"abstract":"<div><div>In the last decade, embedded multi-label feature selection methods, incorporating the search for feature subsets into model optimization, have attracted considerable attention in accurately evaluating the importance of features in multi-label classification tasks. Nevertheless, the state-of-the-art embedded multi-label feature selection algorithms based on least square regression usually cannot preserve sufficient discriminative information in multi-label data. To tackle the challenge, a novel embedded multi-label feature selection method, termed global redundancy and relevance optimization in orthogonal regression (GRROOR), is proposed to facilitate the multi-label feature selection. The method employs orthogonal regression with feature weighting to retain sufficient statistical and structural information related to local label correlations of the multi-label data in the feature learning process. Additionally, both global feature redundancy and global label relevancy information have been considered in the orthogonal regression model, which could contribute to the search for discriminative and non-redundant feature subsets in the multi-label data. The cost function of GRROOR is an unbalanced orthogonal Procrustes problem on the Stiefel manifold. A simple yet effective scheme is utilized to obtain an optimal solution. Extensive experimental results on multiple multi-label data sets demonstrate the effectiveness of GRROOR.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111477"},"PeriodicalIF":7.5,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texture and noise dual adaptation for infrared image super-resolution","authors":"Yongsong Huang , Tomo Miyazaki , Xiaofeng Liu , Yafei Dong , Shinichiro Omachi","doi":"10.1016/j.patcog.2025.111449","DOIUrl":"10.1016/j.patcog.2025.111449","url":null,"abstract":"<div><div>Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. Such imperfections are inherent in the spatial domain of visible images and are accentuated during the imaging process. Enhancing IR image quality by integrating rich texture details from visible images, while minimizing noise transfer, presents a challenging research avenue. To address these challenges, we propose the Texture and Noise Dual Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: (1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and (2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN’s superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at <span><span>https://github.com/yongsongH/DASRGAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111449"},"PeriodicalIF":7.5,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning physical-aware diffusion priors for zero-shot restoration of scattering-affected images","authors":"Yuanjian Qiao , Mingwen Shao , Lingzhuang Meng , Wangmeng Zuo","doi":"10.1016/j.patcog.2025.111473","DOIUrl":"10.1016/j.patcog.2025.111473","url":null,"abstract":"<div><div>Zero-shot image restoration methods using pre-trained diffusion models have recently achieved remarkable success, which tackle image degradation without requiring paired data. However, these methods struggle to handle real-world images with intricate nonlinear scattering degradations due to the lack of physical knowledge. To address this challenge, we propose a novel Physical-aware Diffusion model (PhyDiff) for zero-shot restoration of scattering-affected images, which involves two crucial physical guidance strategies: Transmission-guided Conditional Generation (TCG) and Prior-aware Sampling Regularization (PSR). Specifically, the TCG exploits the transmission map that reflects the degradation density to dynamically guide the restoration of different corrupted regions during the reverse diffusion process. Simultaneously, the PSR leverages the inherent statistical properties of natural images to regularize the sampling output, thereby facilitating the quality of the recovered image. With these ingenious guidance schemes, our PhyDiff achieves high-quality restoration of multiple nonlinear degradations in a zero-shot manner. Extensive experiments on real-world degraded images demonstrate that our method outperforms existing methods both quantitatively and qualitatively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111473"},"PeriodicalIF":7.5,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Qin , Yaochun Lu , Weifu Chen , Defang Li , Guocan Feng
{"title":"AAGCN: An adaptive data augmentation for graph contrastive learning","authors":"Peng Qin , Yaochun Lu , Weifu Chen , Defang Li , Guocan Feng","doi":"10.1016/j.patcog.2025.111471","DOIUrl":"10.1016/j.patcog.2025.111471","url":null,"abstract":"<div><div>Contrastive learning has achieved great success in many applications. A key step in contrastive learning is to find a positive sample and negative samples. Traditional methods find the positive sample by choosing the most similar sample. A more popular approach is to do data augmentation where the original data and the augmented data are naturally treated as positive pairs. It is easy for grid data to do the augmentation, for example, we can rotate, crop or color an image to get augmented images. But it is challenging to do augmentation for graph data, due to non-Euclidean nature of graphs. Current graph augmentation methods mainly focus on masking nodes, dropping edges, or extracting subgraphs. Such methods lack of flexibility and require intensive manual settings. In this work, we propose a model called <em>Adaptive Augmentation Graph Convolutional Network (AAGCN)</em> for semi-supervised node classification, based on adaptive graph augmentation. Rather than choose a probability distribution, for example, Bernoulli distribution, to drop some of the nodes or edges in Dropout, the proposed model learns the mask matrices for nodes or edges adaptively. Experiments on citation networks such as Cora, CiteSeer and Cora-ML show that AAGCN achieved state-of-the-art performance compared with other popular graph neural networks. The proposed model was also tested on a more challenging and large-scale graph dataset, OGBN-Arxiv, which has 169,343 nodes and 1,166,243 edges. The proposed model could still achieve competitive prediction results.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111471"},"PeriodicalIF":7.5,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AdaNet: A competitive adaptive convolutional neural network for spectral information identification","authors":"Ziyang Li , Yang Yu , Chongbo Yin , Yan Shi","doi":"10.1016/j.patcog.2025.111472","DOIUrl":"10.1016/j.patcog.2025.111472","url":null,"abstract":"<div><div>Spectral analysis-based non-destructive testing techniques can monitor food authenticity, quality changes, and traceability. Convolutional neural networks (CNNs) are widely used for spectral information processing and decision-making because they can effectively extract features from spectral data. However, CNNs introduce redundancy in feature extraction, thereby wasting computational resources. This paper proposes a competitive adaptive CNN (AdaNet) to address these challenges. First, adaptive convolution (AdaConv) is used to select spectral features based on channel attention and optimize computational resource allocation. Second, a Gaussian-initialized parameter matrix is applied to rescale spatial relationships and reduce redundancy. Finally, a self-attention mask is employed to mitigate the information loss due to convolution and speed up the convergence of AdaConv. We evaluate AdaNet’s performance compared to other advanced methods. The results show that AdaNet outperforms state-of-the-art techniques, achieving average accuracies of 99.10% and 98.50% on datasets 1 and 2, respectively. We provide a viable approach to enhance the engineering applications of spectral analysis techniques. Code is available at <span><span>https://github.com/Ziyang-Li-AILab/AdaNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111472"},"PeriodicalIF":7.5,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor Transformer for hyperspectral image classification","authors":"Wei-Tao Zhang, Yv Bai, Sheng-Di Zheng, Jian Cui, Zhen-zhen Huang","doi":"10.1016/j.patcog.2025.111470","DOIUrl":"10.1016/j.patcog.2025.111470","url":null,"abstract":"<div><div>Hyperspectral image (HSI) is widely used in real-world classification tasks since it contains rich spatial and spectral features consisting of hundreds of continuous bands. In recent years, the deep learning-based HSI classification methods, such as convolutional neural network (CNN) and Transformer, have achieved good performance in HSI classification tasks. Indeed, it is acknowledged that Transformer-based neural networks, owing to their remarkable capacity to extract long-range features, frequently outperform CNN-based neural networks in HSI classification scenarios. However, Transformer-based methods always require the sequentialization of the raw 3-D HSI data, potentially disrupting the spatial–spectral structural features. This shortcoming has degraded the classification accuracy of HSI data. In this paper, we proposed a Tensor Transformer (TT) framework for HSI classification. The TT model is an end-to-end network that directly takes the raw HSI tensor data as the input sample, without the need for raw data sequentialization. The core component of the proposed framework is the Tensor Self-Attention Mechanism (TSAM), which enables the network to efficiently extract long-range spatial–spectral structural features without losing the inherent structural relationships inner the sample. Through extensive experiments on four widely used HSI datasets, the proposed TT model demonstrates superior classification performance in discriminating land features with similar spectrum compared to state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111470"},"PeriodicalIF":7.5,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kihong Kim , Yunho Kim , Seokju Cho , Junyoung Seo , Jisu Nam , Kychul Lee , Seungryong Kim , KwangHee Lee
{"title":"DiffFace: Diffusion-based face swapping with facial guidance","authors":"Kihong Kim , Yunho Kim , Seokju Cho , Junyoung Seo , Jisu Nam , Kychul Lee , Seungryong Kim , KwangHee Lee","doi":"10.1016/j.patcog.2025.111451","DOIUrl":"10.1016/j.patcog.2025.111451","url":null,"abstract":"<div><div>We propose a novel diffusion-based framework for face swapping, called DiffFace. Unlike previous GAN-based models that inherit the challenges of GAN training, ID-conditional DDPM is trained during the training process to produce face images with a specified identity. During the sampling process, off-the-shelf facial expert models are employed to ensure the model can transfer the source identity while maintaining the target attributes such as structure and gaze. In addition, the target-preserving blending effectively preserve the expression of the target image from noise, while reflecting the environmental context such as background or lighting. The proposed method enables controlling the trade-off between ID and shape without any further re-training. Compared with previous GAN-based methods, DiffFace achieves high fidelity and controllability. Extensive experiments show that DiffFace is comparable or superior to the state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111451"},"PeriodicalIF":7.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cairong Zhao , Chutian Wang , Zifan Song , Guosheng Hu , Liang Wang , Duoqian Miao
{"title":"Multi-definition Deepfake detection via semantics reduction and cross-domain training","authors":"Cairong Zhao , Chutian Wang , Zifan Song , Guosheng Hu , Liang Wang , Duoqian Miao","doi":"10.1016/j.patcog.2025.111469","DOIUrl":"10.1016/j.patcog.2025.111469","url":null,"abstract":"<div><div>The recent development of Deepfake videos directly threatens our information security and personal privacy. Although lots of previous works have made much progress on the Deepfake detection, we empirically find that the existing approaches do not perform well on the low definition (LD) and cross-definition (high and low) videos. To address this problem, in this paper, we follow two motivations: (1) high-level semantics reduction and (2) cross-domain training. For (1), we propose the Facial Structure Destruction and Adversarial Jigsaw Loss to reduce our model to learn high-level semantics and focus on learning low-level discriminative information; For (2), we propose an adversarial domain generalization method and a spatial attention distillation which uses the information of HD videos to guide LD videos. We conduct extensive experiments on public datasets, FaceForensics++ and Celeb-DF v2. Results show the great effectiveness of our method and we also achieve very competitive performance against state-of-the-art methods. Surprisingly, we empirically find that our method is also very effective on Face Anti-Spoofing (FAS) task, verified on OULU-NPU dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111469"},"PeriodicalIF":7.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}