{"title":"Generalizable person re-identification method using bi-stream interactive learning with feature reconstruction","authors":"Feng Min, Yuhui Liu, Yixin Mao","doi":"10.1016/j.patcog.2025.111591","DOIUrl":"10.1016/j.patcog.2025.111591","url":null,"abstract":"<div><div>Recent studies have shown that metric learning and representation learning are two main methods to improve the generalization ability of pedestrian re-identification models. However, their relationship has not been fully explored. Unlike GANs’ emphasis on adversarial learning, our objective is to develop an interactive and synergistic learning framework for them. To achieve this, we propose a generalized pedestrian re-identification method using bi-stream interactive learning. One of the learning streams is the correlation graph sampler (CGS) for metric learning, and the other learning stream is the global sparse attention network (GSANet) for representation learning. We establish an intrinsic connection between these two learning streams. Unlike many existing methods that have high memory and computation costs or lack learning ability, CGS provides a more efficient and effective solution. CGS uses local sensitive hashing and feature metrics to construct the nearest neighbor graph for all categories at the beginning of training, which ensures that each batch of training samples contains randomly selected base categories and their nearest neighbor categories, providing strong similarity and challenging learning examples. As CGS sampling performance is affected by the quality of the feature map, we propose a global feature sparse reconstruction module to enhance the global self-correlation of the feature map extracted by the backbone network. Additionally, we extensively evaluate our method on large-scale datasets, including CUHK03, Market-1501, and MSMT17, and our method outperforms current state-of-the-art methods. These results confirm the effectiveness of our method and demonstrate its potential in pedestrian re-identification applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111591"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwen Deng , Patrick P.K. Chan , Daniel S. Yeung
{"title":"Real-world nighttime image dehazing using contrastive and adversarial learning","authors":"Jingwen Deng , Patrick P.K. Chan , Daniel S. Yeung","doi":"10.1016/j.patcog.2025.111596","DOIUrl":"10.1016/j.patcog.2025.111596","url":null,"abstract":"<div><div>Nighttime image dehazing is a challenging task due to the scarcity of real hazy images and the domain gap between synthetic and real data. To address these challenges, we propose a novel deep learning framework that integrates contrastive and adversarial learning. In the initial training phase, the dehazing generator is trained on synthetic data to produce dehazed images that closely match the ground truths while maintaining a significant distance from the original hazy images through contrastive learning. Simultaneously, the contrastive learning encoder is updated to enhance its ability to distinguish between the dehazed images and ground truths, thereby increasing the difficulty of the dehazing task and pushing the generator to fully exploit feature information for improved results. To bridge the gap between synthetic and real data, the model is fine-tuned using a small set of real hazy images. To mitigate bias from the limited amount of real data, an additional constraint is applied to regulate model adjustments during fine-tuning. Empirical evaluation on multiple benchmark datasets demonstrates that our model outperforms state-of-the-art methods, providing an effective solution for improving visibility in hazy nighttime images by effectively leveraging both synthetic and real data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111596"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiong Zhang , Shuai Yu , Yonghuai Liu , Dan Zhang , Jianyang Xie , Tao Chen , Yalin Zheng , Huazhu Fu , Yitian Zhao
{"title":"3D microvascular reconstruction in retinal OCT angiography images via domain-adaptive learning","authors":"Jiong Zhang , Shuai Yu , Yonghuai Liu , Dan Zhang , Jianyang Xie , Tao Chen , Yalin Zheng , Huazhu Fu , Yitian Zhao","doi":"10.1016/j.patcog.2025.111494","DOIUrl":"10.1016/j.patcog.2025.111494","url":null,"abstract":"<div><div>Optical Coherence Tomography Angiography (OCTA) is a non-invasive imaging technique that enables the acquisition of 3D depth-resolved information with micrometer resolution, facilitating the diagnosis of various eye-related diseases. In OCTA-based image analysis, 2D <em>en face</em> projected images are commonly used for quantifying microvascular changes, while the 3D images with rich depth information remains largely unexplored. This is mainly due to that direct 3D vessel reconstruction faces several challenges, including projection artifacts, complex vessel topology, and high computational cost. These limitations hinder comprehensive microvascular analysis and may obscure potentially vital 3D vessel biomarkers. In this study, we propose a novel method for 3D reconstruction of retinal microvasculature using 2D <em>en face</em> images. Our approach capitalizes on a elaborately generated 2D OCTA depth map for vessel reconstruction, thus eliminating the need for unavailable 3D volumetric data in certain retinal imaging devices. More specifically, we first build a structure-guided depth prediction network which incorporates a domain adaptation module to evaluate the depth maps obtained from different OCTA imaging devices. A point-cloud-to-surface reconstruction method is then utilized to reconstruct the corresponding 3D retinal vessels, based on the predicted depth maps and 2D vascular information. Experimental results demonstrate the superior performance of our method in comparison to existing state-of-the-art techniques. Furthermore, we extract 3D vessel-related features to assess disease correlation and classification, effectively evaluating the potential of our method for guiding subsequent clinical analysis. The results show promise of exploring 3D microvascular analysis for early diagnosis of various eye-related diseases.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111494"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanyu Ye , Wei Wei , Lei Zhang , Chen Ding , Yanning Zhang
{"title":"Domain consistency learning for continual test-time adaptation in image semantic segmentation","authors":"Yanyu Ye , Wei Wei , Lei Zhang , Chen Ding , Yanning Zhang","doi":"10.1016/j.patcog.2025.111585","DOIUrl":"10.1016/j.patcog.2025.111585","url":null,"abstract":"<div><div>In the open-world scenario, the challenge of distribution shift persists. Test-time adaptation adjusts the model during test-time to fit the target domain’s data, addressing the distribution shift between the source and target domains. However, test-time adaptation methods still face significant challenges with continuously changing data distributions, especially since there are few methods applicable to continual test-time adaptation in image semantic segmentation. Furthermore, inconsistent semantic representations across different domains result in catastrophic forgetting in continual test-time adaptation. This paper focuses on the problem of continual test-time adaptation in semantic segmentation tasks and proposes a method named domain consistency learning for continual test-time adaptation. We mitigate catastrophic forgetting through feature-level and prediction-level consistency learning. Specifically, we propose domain feature consistency learning and class awareness consistency learning to guide model learning, enabling the target domain model to extract generalized knowledge. Additionally, to mitigate error accumulation, we propose a novel value-based sample selection method that jointly considers the pseudo-label confidence and style representativeness of the test images. Extensive experiments on widely-used semantic segmentation benchmarks demonstrate that our approach achieves satisfactory performance compared to state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111585"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143643868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-supervised polarization image dehazing method via frequency domain generative adversarial networks","authors":"Rui Sun , Long Chen , Tanbin Liao , Zhiguo Fan","doi":"10.1016/j.patcog.2025.111615","DOIUrl":"10.1016/j.patcog.2025.111615","url":null,"abstract":"<div><div>Haze significantly hinders the application of autonomous driving, traffic surveillance, and remote sensing. Image dehazing serves as a key technology to enhance the clarity of images captured in hazy conditions. However, the lack of paired annotated training data significantly limits the performance of deep learning-based dehazing methods in real-world scenarios. In this work, we propose a self-supervised polarization image dehazing framework based on frequency domain generative adversarial networks. By incorporating a polarization calculation module into the generator, the Stokes parameters of airlight are accurately estimated, which are used to reconstruct the synthesized hazy image by combining the dehazed image generated via a densely connected encoder-decoder. Furthermore, we optimize the discriminator with frequency domain features extracted by frequency decomposition module and introduce a pseudo airlight coefficient supervision loss to enhance the self-supervised training. By discriminating between synthetic hazy images and real hazy images, we achieve adversarial training without the need for paired data. Simultaneously, supervised by the atmospheric scattering model, our network can iteratively generate more realistic dehazed images. Extensive experiments conducted on the constructed multi-view polarization datasets demonstrate that our method achieves state-of-the-art performance without requiring real-world ground truth.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111615"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pingzhu Liu , Wenbin Qian , Jintao Huang , Yanqiang Tu , Yiu-Ming Cheung
{"title":"Transformer-driven feature fusion network and visual feature coding for multi-label image classification","authors":"Pingzhu Liu , Wenbin Qian , Jintao Huang , Yanqiang Tu , Yiu-Ming Cheung","doi":"10.1016/j.patcog.2025.111584","DOIUrl":"10.1016/j.patcog.2025.111584","url":null,"abstract":"<div><div>Multi-label image classification (MLIC) has attracted extensive research attention in recent years. Nevertheless, most of the existing methods have difficulty in effectively fusing multi-scale features and focusing on critical visual information, which makes it difficult to recognize objects from images. Besides, recent studies have utilized graph convolutional networks and attention mechanisms to model label dependencies in order to improve the model performance. However, these methods often rely on manually predefined label structures, which limits flexibility and model generality. And they also fail to capture intrinsic object correlations within images and spatial contexts. To address these challenges, we propose a novel Feature Fusion network combined with Transformer (FFTran) to fuse different visual features. Firstly, to address the difficulties of current methods in recognizing small objects, we propose a Multi-level Scale Information Integration Mechanism (MSIIM) that fuses different feature maps from the backbone network. Secondly, we develop an Intra-Image Spatial-Channel Semantic Mining (ISCM) module for learning important spaces and channel information. Thirdly, we design a Visual Feature Coding based on Transformer (VFCT) module to enhance the contextual information by pooling different visual features. Compared to the baseline model, FFTran achieves a significant boost in mean Average Precision (mAP) on both the VOC2007 and COCO2014 datasets, with enhancements of 2.9% and 5.1% respectively, highlighting its superior performance in multi-label image classification tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111584"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaohui Yang , Yu Qiao , Tongzhen Si , Jing Wang , Tao Xu
{"title":"Eye-SCAN: Eye-Movement-Attention-based Spatial Channel Adaptive Network for traffic accident prediction","authors":"Xiaohui Yang , Yu Qiao , Tongzhen Si , Jing Wang , Tao Xu","doi":"10.1016/j.patcog.2025.111590","DOIUrl":"10.1016/j.patcog.2025.111590","url":null,"abstract":"<div><div>In the task of using visual cues extracted from DashCam video data to predict future accidents, understanding the dynamic spatio-temporal interactions in driving scenarios poses a major challenge. Given that the gaze attention information of experienced drivers during the driving process involves complex spatio-temporal interactions, this information can provide valuable guidance for training accident prediction models. Therefore, we propose an Eye-Movement-Attention-based Spatial Channel Adaptive Network (Eye-SCAN) for traffic accident prediction, which can efficiently learn multi-scale spatial channel information from driver gaze data. To integrate potential guidance information from driver eye movement information (EyeInfo) into Eye-SCAN, we propose two sub-modules in our model: the Spatial Adaptive Module (SAM), which helps Eye-SCAN adaptively learn low-dimensional spatial features of EyeInfo; and the Channel Adaptive Module (CAM), which aids Eye-SCAN to adaptively learning high-dimensional channel features of EyeInfo. Additionally, we introduce a novel recursive transmission strategy for temporal information to mitigate the impact of varying past results on the model’s current inferences. Experimental results demonstrate that our model outperforms state-of-the-art methods on two benchmark datasets, highlighting the contributions of each component and offering an effective solution for enhancing the safety of intelligent vehicles.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111590"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaojin Lin , Yulin Li , Shidong Lin , Lei Guo , Yu Mao
{"title":"Partial multi-label feature selection based on label distribution learning","authors":"Yaojin Lin , Yulin Li , Shidong Lin , Lei Guo , Yu Mao","doi":"10.1016/j.patcog.2025.111523","DOIUrl":"10.1016/j.patcog.2025.111523","url":null,"abstract":"<div><div>Partial Multi-label Learning (PML) induces a multi-classifier in an imprecise supervised environment, where the candidate labels associated with each training sample are partially valid. The high-dimensional feature space, presented in PML data accompanied by ambiguous labeling information, is a significant challenge for learning. In this paper, we propose a PML feature selection method based on Label Distribution Learning (LDL), which handles the above challenges by correcting misleading and then selecting common and label-specific features. In the first procedure, the error distribution hypothesis is constructed, which divides the structure of ambiguous label information into minority and majority error distribution according to the error amount that may appear in the data annotation process. Under the analysis of the hypothesis, the label credibility distribution data (LCDD) was generated by identifying and correcting errors, where the fractional category of each label associated with each training sample describes the probability that the label belongs to that sample. In the second procedure, a discriminative feature subset is selected for PML based on LCDD by common and label-specific feature constraints. Experiments on three synthetic and five real PML datasets demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111523"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Moises Diaz , Miguel A. Ferrer , Juan M. Gil , Rafael Rodriguez , Peirong Zhang , Lianwen Jin
{"title":"Online Signature Verification based on the Lagrange formulation with 2D and 3D robotic models","authors":"Moises Diaz , Miguel A. Ferrer , Juan M. Gil , Rafael Rodriguez , Peirong Zhang , Lianwen Jin","doi":"10.1016/j.patcog.2025.111581","DOIUrl":"10.1016/j.patcog.2025.111581","url":null,"abstract":"<div><div>Online Signature Verification commonly relies on function-based features, such as time-sampled horizontal and vertical coordinates, as well as the pressure exerted by the writer, obtained through a digitizer. Although inferring additional information about the writer’s arm pose, kinematics, and dynamics based on digitizer data can be useful, it constitutes a challenge. In this paper, we tackle this challenge by proposing a new set of features based on the dynamics of online signatures. These new features are inferred through a Lagrangian formulation, obtaining the sequences of generalized coordinates and torques for 2D and 3D robotic arm models. By combining kinematic and dynamic robotic features, our results demonstrate their significant effectiveness for online automatic signature verification and achieving state-of-the-art results when integrated into deep learning models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111581"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143682026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Granular-ball computing-based Random Walk for anomaly detection","authors":"Sihan Wang , Zhong Yuan , Shitong Cheng , Hongmei Chen , Dezhong Peng","doi":"10.1016/j.patcog.2025.111588","DOIUrl":"10.1016/j.patcog.2025.111588","url":null,"abstract":"<div><div>Anomaly detection is a key task in data mining, which has been successfully employed in many practical scenarios. However, most existing methods usually analyze the anomalous characteristics of samples at a single and finest granularity, which leads to high computational cost and low efficiency. As one of the significant mathematical models in the theory of granular computing, granular-ball computing can portray the distributional characteristics of data from a multi-granularity perspective. For this reason, this paper proposes an unsupervised anomaly detection method based on granular-ball computing. Firstly, the samples are covered by generating adaptive granular-balls, and the multi-granularity information represented by granular-balls with different sizes can reflect the data distribution characteristics of the corresponding region. Secondly, the granular-balls are used to fit the samples for constructing a state transfer matrix in Random walk. Then, the steady-state distribution is generated using iterative computation and is normalized as the degree of anomaly for each granular-ball. Finally, the anomaly score for each sample is computed by relating the anomaly degree of each granular-ball to the samples it covers. Comparative experiments show that the proposed anomaly detection method performs well on multiple datasets, demonstrating its feasibility and superiority in practical applications. The code is publicly available online at <span><span>https://github.com/optimusprimeyy/GBRAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111588"},"PeriodicalIF":7.5,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}