{"title":"Heterogeneous graph contrastive learning with spectral augmentation and dual aggregation","authors":"Jing Zhang , Wan Zhang , Xiaoqian Jiang , Yingjie Xie , Yali Yuan , Shunmei Meng , Cangqi Zhou","doi":"10.1016/j.patcog.2025.112505","DOIUrl":"10.1016/j.patcog.2025.112505","url":null,"abstract":"<div><div>Heterogeneous graphs effectively model complex entity relationships in real-world scenarios. However, existing methods primarily focus on topological structures, overlooking the spectral domain, which limits their ability to capture rich, multi-dimensional graph information. Many rely on meta-path schemes to encode semantic details of specific node types, neglecting others and local structural nuances. Thus, they fail to capture comprehensive structural information. To address these issues, a novel combined <u>d</u>ual <u>a</u>ggregation and <u>s</u>pectral <u>a</u>ugmented algorithm, the <u>h</u>eterogeneous <u>g</u>raph <u>c</u>ontrast <u>l</u>earning model (DasaHGCL), is proposed. It applies adaptive spectral augmentation introduced from homogeneous graph learning to the meta-path view of heterogeneous graphs, capturing their spectral invariance for the first time. It also creates an intra-scheme contrast mechanism in dual aggregation algorithms for meta-path and network schema, which circumvents the effect of differences between different aggregation schemes on the model to effectively capture higher-order semantic information and local heterogeneous structural features. Experiments on multiple real-world datasets demonstrate the clear advantages of DasaHGCL.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112505"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zheng Wu , Kehua Guo , Sheng Ren , Bin Hu , Xiangyuan Zhu , Rui Ding
{"title":"DBL: Dual-Level balanced learning for long-Tailed classification","authors":"Zheng Wu , Kehua Guo , Sheng Ren , Bin Hu , Xiangyuan Zhu , Rui Ding","doi":"10.1016/j.patcog.2025.112448","DOIUrl":"10.1016/j.patcog.2025.112448","url":null,"abstract":"<div><div>Real-world data are typically long-tailed, causing neural networks to over-fit head classes and underperform on rare tails. We propose Dual-Level Balanced Learning (DBL), an efficient training framework that balances gradients at both the class and instance levels. DBL combines Class-aware Balancing (CB), which corrects class-level imbalance by re-weighting gradients according to prediction bias; Instance-aware Balancing (IB), which alleviates instance-level imbalance by emphasising the learning of hard examples; and a lightweight Cross-Level Collaboration (CC) scheme that harmonises the two losses. By jointly addressing class- and instance-level imbalance, DBL delivers consistent gains across all classes and most individual samples. Extensive experiments on CIFAR10/100-LT, ImageNet-LT, Places-LT, and iNaturalist18 show that DBL sets new state-of-the-art accuracy on all five benchmarks, confirming its robustness to severe long-tailed distributions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112448"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junhua Fang , Jiayi Li , Chunhui Feng , Zhicheng Pan , Pingfu Chao , Jiajie Xu , Pengpeng Zhao
{"title":"SeeD: Online similarity-preserving pattern discovery for streaming trajectories","authors":"Junhua Fang , Jiayi Li , Chunhui Feng , Zhicheng Pan , Pingfu Chao , Jiajie Xu , Pengpeng Zhao","doi":"10.1016/j.patcog.2025.112446","DOIUrl":"10.1016/j.patcog.2025.112446","url":null,"abstract":"<div><div>The rapid accumulation of fresh trajectory data has fueled a growing interest in the analysis of such data. There has been a notable economic and social value attributed to effectively uncovering mobility behaviors within rich, streaming trajectory data for applications like urban planning, marketing and intelligence. Despite extensive research on pattern discovery, existing methods often confine themselves to fixed patterns, neglecting the potential synergy between pattern discovery and similarity queries. This synergy can be bidirectional: similarity results could be the foundation of pattern discovery, while pattern discovery can accelerate the similarity queries. To bridge this gap, we propose the Online <u>S</u>imilarity-preserving Traj<u>e</u>ctory Patt<u>e</u>rn <u>D</u>iscovery, called <strong>SeeD</strong>. This framework consists of three core modules: (1) The composite windowing strategy, which extracts multi-scale trajectory information and maintains correlation patterns, ensuring data relevance across various scales. (2) The <u>C</u>lustering-based <u>S</u>imilarity <u>Q</u>uery (CSQ) module, which accelerates similarity computation based on pattern discovery results, thus improving query efficiency. (3) The <u>E</u>volution <u>D</u>etection and <u>A</u>nalysis (EDA) module, which enhances overall performance by analyzing pattern evolution, providing insights into dynamic changes within trajectory data. Extensive experimental results conducted on well-established datasets unequivocally demonstrate the effectiveness of SeeD, indicating its potential to revolutionize the field by offering a robust solution for pattern discovery.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112446"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CLIP can understand depth","authors":"Sohee Kim , Jisu Kang , Dunam Kim , Seokju Lee","doi":"10.1016/j.patcog.2025.112475","DOIUrl":"10.1016/j.patcog.2025.112475","url":null,"abstract":"<div><div>In this paper, we demonstrate that CLIP can also be adapted to downstream tasks where its vision-language alignment is suboptimally learned during pre-training on web-crawled data, all without requiring fine-tuning. We explore the case of monocular depth estimation, where CLIP’s contrastive prior struggles to generalize, compared to its success in domains such as generative modeling and semantic segmentation. Since CLIP fails to consistently capture similarities between image patches and natural language prompts describing distance, we eliminate the use of its pre-trained natural language token embeddings and distill the semantic prior of its frozen text encoder into a single learnable embedding matrix called <em>“mirror”</em>. The main design goal of <em>mirror</em> is to derive a non-human language prompt that approximates an optimal natural language prompt: “<em>How far is this location from the camera?</em>” Using this approach, we jointly train two lightweight modules, a <em>mirror</em> and a compact decoder, on top of a frozen CLIP for dense depth prediction. Compared to conventional depth models, our framework is significantly more efficient in terms of parameters and computation. The resulting model exhibits impressive performance, matching several state-of-the-art vision models on the NYU Depth v2 and KITTI benchmark datasets, while outperforming all vision-language depth models based on a frozen CLIP prior. Specifically, our method reduces the Absolute Relative Error (Abs Rel) by 68.7 % on NYU Depth v2 and by 75.6 % on KITTI compared to the method of Auty <em>et al.</em>, a representative CLIP-based baseline. Experiments demonstrate that the suboptimal depth understanding of CLIP in terms of spatial and temporal consistency can be significantly corrected without either fine-tuning it or concatenating <em>mirror</em> with its pre-trained subword token embeddings. Furthermore, an ablation study on the convergence status of <em>mirror</em> shows that it is implicitly trained to capture objects, such as humans and windows, where semantic cues play an important role in detection.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112475"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongbin Yu , Xiangcan Du , Wei Song , Haojie Zhou , Junyi Zhang
{"title":"A collaborative spatial–frequency learning network for infrared and visible image fusion","authors":"Hongbin Yu , Xiangcan Du , Wei Song , Haojie Zhou , Junyi Zhang","doi":"10.1016/j.patcog.2025.112480","DOIUrl":"10.1016/j.patcog.2025.112480","url":null,"abstract":"<div><div>Most existing deep fusion models operate predominantly in the spatial domain, which limits their ability to effectively preserve texture details. In contrast, methods that incorporate frequency-domain information often suffer from inadequate interaction with spatial-domain features, thereby constraining overall fusion performance. To address these limitations, we propose a Collaborative Spatial-Frequency Learning Network (CSFNet) for infrared and visible image fusion. In the frequency-domain learning branch, we introduce a frequency refinement module based on wavelet transform to enable cross-band feature interaction and facilitate effective multi-scale feature fusion. In the spatial-domain branch, we embed a learnable low-rank decomposition model that extracts low-rank features from infrared images and sparse detail features from visible images, forming the basis of a dedicated spatial feature extraction module. Additionally, an information aggregation module is designed to learn complementary representations and integrate cross-domain features efficiently. To validate the effectiveness of the proposed approach, we conducted extensive experiments on three publicly available datasets: MSRS, TNO, and RoadScene, and compared CSFNet with sixteen state-of-the-art (SOTA) fusion methods. On the MSRS dataset, CSFNet achieved favorable results, with a mean and standard deviation of SF = 12.2108 <span><math><mo>±</mo></math></span> 3.8706, VIF = 1.0232 <span><math><mo>±</mo></math></span> 0.1397, Qabf = 0.7112 <span><math><mo>±</mo></math></span> 0.0397, SSIM = 0.6909 <span><math><mo>±</mo></math></span> 0.0859, PSNR = 17.6517 <span><math><mo>±</mo></math></span> 3.8767, and AG = 4.0243 <span><math><mo>±</mo></math></span> 1.5465. The minimum performance improvement over SOTA methods was 1.64 %, while the maximum gain reached 108.82 %. Furthermore, CSFNet demonstrated superior performance on a downstream semantic segmentation task.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112480"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuaixian Wang , Yaokun Li , Chenhui Guo , Guang Tan
{"title":"Learning hierarchical uncertainty from hybrid representations for neural active reconstruction","authors":"Shuaixian Wang , Yaokun Li , Chenhui Guo , Guang Tan","doi":"10.1016/j.patcog.2025.112493","DOIUrl":"10.1016/j.patcog.2025.112493","url":null,"abstract":"<div><div>Active reconstruction is a key area for the robotics and computer vision communities, enabling autonomous agents to dynamically reconstruct scenes or objects from multiple viewpoints for navigation and manipulation tasks. Although existing methods have achieved promising results in 3D reconstruction, the hierarchical uncertainty-aware active reconstruction based on hybrid implicit representations remains underexplored, particularly in balancing accuracy, efficiency, and adaptability. To address this gap, we propose a neural active reconstruction system that combines hybrid neural representations with uncertainty. Specifically, we explore a novel scheme that integrates occupancy, signed distance function, and neural radiance fields for high-fidelity 3D reconstruction. Additionally, we utilize hierarchical uncertainty associated with different representations to select the next best viewpoint for trajectory planning and optimization. Our system has been extensively evaluated on benchmark datasets including Replica and MP3D, demonstrating qualitatively and quantitatively improved reconstruction quality and view planning efficiency compared to baseline approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112493"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuanlong Ma , Yanhong She , Fenfang Xie , Guo Zhong
{"title":"Error-Resilient incomplete multi-View clustering: Mitigating imputation-induced error accumulation","authors":"Xuanlong Ma , Yanhong She , Fenfang Xie , Guo Zhong","doi":"10.1016/j.patcog.2025.112477","DOIUrl":"10.1016/j.patcog.2025.112477","url":null,"abstract":"<div><div>Incomplete Multi-View Clustering (IMC) plays a pivotal role in integrating and analyzing multi-view data with missing information. Most of existing IMC methods improve clustering performance by inherently incorporating a data recovery step to derive a common representation or consensus graph. However, the imputation of missing data may introduce biased errors, which can accumulate and amplify during iterative optimization, ultimately distorting clustering results. To tackle this critical issue, we propose a novel unified optimization framework that jointly learns data completion and error removal in a mutually reinforcing manner. Specifically, our method introduces a dual-path architecture: one path reconstructs missing views via self-representation, while the other path explicitly models and eliminates biased errors. Crucially, these two components interact via an alternating minimization scheme, enabling them to mutually enhance each other. This synergy effectively reduces error accumulation, leading to a more accurate graph for clustering. Experiments on real-world datasets show that the proposed framework achieves state-of-the-art performance under extremely high missing rates (up to 90 %), significantly reducing error propagation while outperforming existing baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112477"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhihao Zhou , Rui Li , Wenjie Ai , Xueying Li , Zhu Teng , Baopeng Zhang , Junwei Du
{"title":"Affinity-aware uncertainty quantification for learning with noisy labels","authors":"Zhihao Zhou , Rui Li , Wenjie Ai , Xueying Li , Zhu Teng , Baopeng Zhang , Junwei Du","doi":"10.1016/j.patcog.2025.112495","DOIUrl":"10.1016/j.patcog.2025.112495","url":null,"abstract":"<div><div>Training deep neural networks (DNNs) with noisy labels is a challenging task that significantly degenerates the model’s performance. Most existing methods mitigate this problem by identifying and eliminating noisy samples or correcting their labels according to statistical properties like confidence values. However, these methods often overlook the impact of inherent noise, such as sample quality, which can mislead DNNs to focus on incorrect regions, adulterate the softmax classifier, and generate low-quality pseudo-labels. In this paper, we propose a novel Affinity-aware Uncertainty Quantification (AUQ) framework to explore the perception ambiguity and rectify the salient bias by quantifying the uncertainty. Concretely, we construct the dynamic prototypes to represent intra-class semantic spaces and estimate the uncertainty based on sample-prototype pairs, where the observed affinities between sample-prototype pairs are converted to probabilistic representations as the estimated uncertainty. Samples with higher uncertainty are likely to be hard samples and we design an uncertainty-aware loss to emphasize the learning from those samples with high uncertainty, which helps DNNs to gradually concentrate on the critical regions. Besides, we further utilize sample-prototype affinities to adaptively refine pseudo-labels, enhancing the quality of supervisory signals for noisy samples. Extensive experiments conducted on the CIFAR-10, CIFAR-100 and Clothing1M datasets demonstrate the efficacy and effectiveness of AUQ. Notably, we achieve an average performance gain of 0.4 % on CIFAR-10 and a substantial average improvement of 2.3 % over the second-best method on the more challenging CIFAR-100 dataset. Moreover, there is a 0.6 % improvement over the sub-optimal method on Clothing1M. These results validate AUQ’s capability in enhancing DNN robustness against noisy labels.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112495"},"PeriodicalIF":7.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RGBX-DiffusionDet: a framework for multi-modal RGB-X object detection using DiffusionDet","authors":"Eliraz Orfaig, Inna Stainvas, Igal Bilik","doi":"10.1016/j.patcog.2025.112460","DOIUrl":"10.1016/j.patcog.2025.112460","url":null,"abstract":"<div><div>This work addresses the challenge of object detection using multimodal heterogeneous sensors by extending the recently proposed DiffusionDet framework, initially designed for RGB-only detection. We propose RGBX-DiffusionDet, a generalized diffusion-based object detection framework that enables seamless fusion of heterogeneous 2D modalities (denoted as “X”, e.g., depth, infrared, and polarimetric data) with RGB imagery. The proposed approach adopts a mid-level feature fusion strategy to address the heterogeneous nature of multimodal data, characterized by varying spatial resolutions, noise profiles, and semantic content. Instead of commonly used brute-force feature concatenation, we introduce two novel architectural components: (1) a dynamic channel reduction convolutional block attention module (DCR-CBAM), which enhances cross-modal fusion by emphasizing salient channel features while reducing the dimensionality of merged RGB-X features, and (2) a dynamic multi-level aggregation block (DMLAB), which addresses a limitation of the baseline DiffusionDet decoder by adaptively fusing spatial features to improve object localization. Additionally, we incorporate novel regularization losses that promote channel saliency and spatial selectivity, resulting in compact and discriminative feature embeddings. Extensive experiments on RGB-depth (KITTI), a newly annotated RGB-polarimetric (RGB-P) dataset, and RGB-infrared (M3FD) benchmarks demonstrate consistent superiority of the proposed approach over RGB-only baselines, while maintaining decoding efficiency. We further show that RGBX-DiffusionDet exhibits improved robustness and generalization capability in visually-corrupted conditions, demonstrating its practical efficiency for robust multimodal object detection.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112460"},"PeriodicalIF":7.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}