Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang
{"title":"Consistency-driven feature scoring and regularization network for visible–infrared person re-identification","authors":"Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang","doi":"10.1016/j.patcog.2024.111131","DOIUrl":"10.1016/j.patcog.2024.111131","url":null,"abstract":"<div><div>Recently, visible–infrared person re-identification (VI-ReID) has received considerable attention due to its practical importance. A number of methods extract multiple local features to enrich the diversity of feature representations. However, some local features often involve modality-relevant information, leading to deteriorated performance. Moreover, existing methods optimize the models by only considering the samples at each batch while ignoring the learned features at previous iterations. As a result, the features of the same person images drastically change at different training epochs, hindering the training stability. To alleviate the above issues, we propose a novel consistency-driven feature scoring and regularization network (CFSR-Net), which consists of a backbone network, a local feature learning block, a feature scoring block, and a global–local feature fusion block, for VI-ReID. On the one hand, we design a cross-modality consistency loss to highlight modality-irrelevant local features and suppress modality-relevant local features for each modality, facilitating the generation of a reliable compact local feature. On the other hand, we develop a feature consistency regularization strategy (including a momentum class contrastive loss and a momentum distillation loss) to impose consistency regularization on the learning of different levels of features by considering the learned features at historical epochs. This effectively enables smooth feature changes and thus improves the training stability. Extensive experiments on public VI-ReID datasets clearly show the effectiveness of our method against several state-of-the-art VI-ReID methods. Code will be released at <span><span>https://github.com/cxtjl/CFSR-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111131"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels","authors":"Filipe R. Cordeiro , Gustavo Carneiro","doi":"10.1016/j.patcog.2024.111132","DOIUrl":"10.1016/j.patcog.2024.111132","url":null,"abstract":"<div><div>An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at <span><span>https://github.com/filipe-research/anne</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111132"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-supervised random mask attention GAN in tackling pose-invariant face recognition","authors":"Jiashu Liao , Tanaya Guha , Victor Sanchez","doi":"10.1016/j.patcog.2024.111112","DOIUrl":"10.1016/j.patcog.2024.111112","url":null,"abstract":"<div><div>Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called <em>Mask Rotate</em>. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised <em>Random Mask</em> Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed <em>Mask Rotate</em> framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the <em>Mask Rotate</em> framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111112"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang
{"title":"Spcolor: Semantic prior guided exemplar-based image colorization","authors":"Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang","doi":"10.1016/j.patcog.2024.111109","DOIUrl":"10.1016/j.patcog.2024.111109","url":null,"abstract":"<div><div>Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods directly search for correspondence over the entire reference image, and this type of global matching is prone to mismatch. Intuitively, a reasonable correspondence should be established between objects which are semantically similar. Motivated by this, we introduce the idea of semantic prior and propose SPColor, a semantic prior guided exemplar-based image colorization framework. Several novel components are systematically designed in SPColor, including a semantic prior guided correspondence network (SPC), a category reduction algorithm (CRA), and a similarity masked perceptual loss (SMP loss). Different from previous methods, SPColor establishes the correspondence between the pixels in the same semantic class locally. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. In addition, SPColor supports region-level class assignments before SPC in the pipeline. With this feature, a category manipulation process (CMP) is proposed as an interactive interface to control colorization, which can also produce more varied colorization results and improve the flexibility of reference selection. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset. Our code is available at <span><span>https://github.com/viector/spcolor</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111109"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runlin Cao , Zhixin Li , Zhenjun Tang , Canlong Zhang , Huifang Ma
{"title":"Enhancing robust VQA via contrastive and self-supervised learning","authors":"Runlin Cao , Zhixin Li , Zhenjun Tang , Canlong Zhang , Huifang Ma","doi":"10.1016/j.patcog.2024.111129","DOIUrl":"10.1016/j.patcog.2024.111129","url":null,"abstract":"<div><div>Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an intelligent agent using visual and textual information. However, recent research indicates that many VQA models rely primarily on learning the correlation between questions and answers in the training dataset rather than demonstrating actual reasoning ability. To address this limitation, we propose a novel training approach called Enhancing Robust VQA via Contrastive and Self-supervised Learning (CSL-VQA) to construct a more robust VQA model. Our approach involves generating two types of negative samples to balance the biased data, using self-supervised auxiliary tasks to help the base VQA model overcome language priors, and filtering out biased training samples. In addition, we construct positive samples by removing spurious correlations in biased samples and perform auxiliary training through contrastive learning. Our approach does not require additional annotations and is compatible with different VQA backbones. Experimental results demonstrate that CSL-VQA significantly outperforms current state-of-the-art approaches, achieving an accuracy of 62.30% on the VQA-CP v2 dataset, while maintaining robust performance on the in-distribution VQA v2 dataset. Moreover, our method shows superior generalization capabilities on challenging datasets such as GQA-OOD and VQA-CE, proving its effectiveness in reducing language bias and enhancing the overall robustness of VQA models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111129"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransMatch: Transformer-based correspondence pruning via local and global consensus","authors":"Yizhang Liu , Yanping Li , Shengjie Zhao","doi":"10.1016/j.patcog.2024.111120","DOIUrl":"10.1016/j.patcog.2024.111120","url":null,"abstract":"<div><div>Correspondence pruning aims to filter out false correspondences (a.k.a. outliers) from the initial feature correspondence set, which is pivotal to matching-based vision tasks, such as image registration. To solve this problem, most existing learning-based methods typically use a multilayer perceptron framework and several well-designed modules to capture local and global contexts. However, few studies have explored how local and global consensuses interact to form cohesive feature representations. This paper proposes a novel framework called TransMatch, which leverages the full power of Transformer structure to extract richer features and facilitate progressive local and global consensus learning. In addition to enhancing feature learning, Transformer is used as a powerful tool to connect the above two consensuses. Benefiting from Transformer, our TransMatch is surprisingly effective for differentiating correspondences. Experimental results on correspondence pruning and camera pose estimation demonstrate that the proposed TransMatch outperforms other state-of-the-art methods by a large margin. The code will be available at <span><span>https://github.com/lyz8023lyp/TransMatch/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111120"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"L2T-DFM: Learning to Teach with Dynamic Fused Metric","authors":"Zhaoyang Hai, Liyuan Pan, Xiabi Liu, Mengqiao Han","doi":"10.1016/j.patcog.2024.111124","DOIUrl":"10.1016/j.patcog.2024.111124","url":null,"abstract":"<div><div>The loss function plays a crucial role in the construction of machine learning algorithms. Employing a teacher model to set loss functions dynamically for student models has attracted attention. In existing works, (1) the characterization of the dynamic loss suffers from some inherent limitations, <em>ie</em>, the computational cost of loss networks and the restricted similarity measurement handcrafted loss functions; and (2) the states of the student model are provided to the teacher model directly without integration, causing the teacher model to underperform when trained on insufficient amounts of data. To alleviate the above-mentioned issues, in this paper, we select and weigh a set of similarity metrics by a confidence-based selection algorithm and a temporal teacher model to enhance the dynamic loss functions. Subsequently, to integrate the states of the student model, we employ statistics to quantify the information loss of the student model. Extensive experiments demonstrate that our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, object detection, and semantic segmentation scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111124"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shurui Li , Liming Zhao , Chang Liu , Jing Jin , Cuntai Guan
{"title":"Self-distillation with beta label smoothing-based cross-subject transfer learning for P300 classification","authors":"Shurui Li , Liming Zhao , Chang Liu , Jing Jin , Cuntai Guan","doi":"10.1016/j.patcog.2024.111114","DOIUrl":"10.1016/j.patcog.2024.111114","url":null,"abstract":"<div><h3>Background:</h3><div>The P300 speller is one of the most well-known brain-computer interface (BCI) systems, offering users a novel way to communicate with their environment by decoding brain activity.</div></div><div><h3>Problem:</h3><div>However, most P300-based BCI systems require a longer calibration phase to develop a subject-specific model, which can be inconvenient and time-consuming. Additionally, it is challenging to implement cross-subject P300 classification due to significant inter-individual variations.</div></div><div><h3>Method:</h3><div>To address these issues, this study proposes a calibration-free approach for P300 signal detection. Specifically, we incorporate self-distillation along with a beta label smoothing method to enhance model generalization and overall system performance, which can not only enable the distillation of informative knowledge from the electroencephalogram (EEG) data of other subjects but effectively reduce individual variability.</div></div><div><h3>Experimental results:</h3><div>The results conducted on the publicly available OpenBMI dataset demonstrate that the proposed method achieves statistically significantly higher performance compared to state-of-the-art approaches. Notably, the average character recognition accuracy of our method reaches up to 97.37% without the need for calibration. And information transfer rate and visualization further confirm its effectiveness.</div></div><div><h3>Significance:</h3><div>This method holds great promise for future developments in BCI applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111114"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text–video retrieval re-ranking via multi-grained cross attention and frozen image encoders","authors":"Zuozhuo Dai , Kaihui Cheng , Fangtao Shao , Zilong Dong , Siyu Zhu","doi":"10.1016/j.patcog.2024.111099","DOIUrl":"10.1016/j.patcog.2024.111099","url":null,"abstract":"<div><div>State-of-the-art methods for text–video retrieval generally leverage CLIP embeddings and cosine similarity for efficient retrieval. Meanwhile, recent advancements in cross-attention techniques introduce transformer decoders to facilitate attention computation between text queries and visual tokens extracted from video frames, enabling a more comprehensive interaction between textual and visual information. In this study, we combine the advantages of both approaches and propose a fine-grained re-ranking approach incorporating a multi-grained text–video cross attention module. Specifically, the re-ranker enhances the top K similar candidates identified by the cosine similarity network. To explore video and text interactions efficiently, we introduce frame and video token selectors to obtain salient visual tokens at both frame and video levels. Then, a multi-grained cross-attention mechanism is applied between text and visual tokens at these levels to capture multimodal information. To reduce the training overhead associated with the multi-grained cross-attention module, we freeze the vision backbone and only train the multi-grained cross attention module. This frozen strategy allows for scalability to larger pre-trained vision models such as ViT-G, leading to enhanced retrieval performance. Experimental evaluations on text–video retrieval datasets showcase the effectiveness and scalability of our proposed re-ranker combined with existing state-of-the-art methodologies.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111099"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating the convergence of concept drift based on knowledge transfer","authors":"Husheng Guo , Zhijie Wu , Qiaoyan Ren , Wenjian Wang","doi":"10.1016/j.patcog.2024.111145","DOIUrl":"10.1016/j.patcog.2024.111145","url":null,"abstract":"<div><div>Concept drift detection and processing is an important issue in streaming data mining. When concept drift occurs, online learning model often cannot quickly adapt to the new data distribution due to the insufficient newly distributed data, which may lead to poor model performance. Currently, most online learning methods adapt to new data distributions after concept drift through autonomous adjustment of the model, but they may often fail to update the model to a stable state quickly. To solve these problems, this paper proposes an accelerating convergence method of concept drift based on knowledge transfer (<span><math><mrow><mi>ACC</mi><mtext>_</mtext><mi>KT</mi></mrow></math></span>). It extracts the most valuable information from the source domain (pre-drift data), and transfers it to the target domain (post-drift data), to realize the update of the ensemble model by knowledge transfer. Besides, different knowledge transfer patterns are adopted to accelerate convergence of model performance when different types concept drift occur. Experimental results show that the proposed method has an obvious acceleration effect on the online learning model after concept drift.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111145"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}