{"title":"Knowledge-enhanced and structure-enhanced representation learning for protein–ligand binding affinity prediction","authors":"Mei Li , Ye Cao , Xiaoguang Liu , Hua Ji","doi":"10.1016/j.patcog.2025.111701","DOIUrl":"10.1016/j.patcog.2025.111701","url":null,"abstract":"<div><div>Protein–ligand binding affinity (PLA) prediction is a fundamental preliminary stage in drug discovery and development. Existing methods mainly focus on structure-free prediction of binding affinities and the investigation of structural PLA prediction is not fully explored yet. Spatial structures of protein–ligand complexes are critical in determining binding affinities. A few graph neural network (GNN) based methods model spatial structures of complexes with pairwise atomic distances within a cutoff, which provides insufficient spatial descriptions and limits their capabilities in distinguishing between certain molecules. In this paper, we propose a knowledge-enhanced and structure-enhanced representation learning method (KSM) for structural PLA prediction. The proposed KSM has a specially designed structure-based GNN (KSGNN) to learn complete representations for PLA prediction by combining sequence and structure information of complexes. Notably, KSGNN is capable of learning structure-aware representations via incorporating relative spatial information of distances and angles among atoms into the message passing. Additionally, we adopt an attentive pooling layer (APL) to further refine structural patterns in complexes. We compare KSM against 18 state-of-the-art baselines on two benchmarks. KSM outperforms its competitors with improvements of 0.0536 and 0.19 on the PDBbind core set and the CSAR-HiQ dataset, respectively, in terms of the metric of RMSE, demonstrating its superiority in binding affinity prediction.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111701"},"PeriodicalIF":7.5,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Wang , Sarah Erfani , Tansu Alpcan , Christopher Leckie
{"title":"OIL-AD: An anomaly detection framework for decision-making sequences","authors":"Chen Wang , Sarah Erfani , Tansu Alpcan , Christopher Leckie","doi":"10.1016/j.patcog.2025.111656","DOIUrl":"10.1016/j.patcog.2025.111656","url":null,"abstract":"<div><div>Anomaly detection in decision-making sequences is a challenging problem due to the complexity of normality representation learning and the sequential nature of the task. Most existing methods based on Reinforcement Learning (RL) are difficult to implement in the real world due to unrealistic assumptions, such as having access to environment dynamics, reward signals, and online interactions with the environment. To address these limitations, we propose an unsupervised method named Offline Imitation Learning based Anomaly Detection (OIL-AD), which detects anomalies in decision-making sequences using two extracted behaviour features: <em>action optimality</em> and <em>sequential association</em>. Our offline learning model is an adaptation of behavioural cloning with a transformer policy network, where we modify the training process to learn a Q function and a state value function from normal trajectories. We propose that the Q function and the state value function can provide sufficient information about agents’ behavioural data, from which we derive two features for anomaly detection. The intuition behind our method is that the <em>action optimality</em> feature derived from the Q function can differentiate the optimal action from others at each local state, and the <em>sequential association</em> feature derived from the state value function has the potential to maintain the temporal correlations between decisions (state–action pairs). Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score over comparable baselines. The source code is available on <span><span>https://github.com/chenwang4/OILAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111656"},"PeriodicalIF":7.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143867751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sample selection for noisy partial label learning with interactive contrastive learning","authors":"Xiaotong Yu , Shiding Sun , Yingjie Tian","doi":"10.1016/j.patcog.2025.111681","DOIUrl":"10.1016/j.patcog.2025.111681","url":null,"abstract":"<div><div>In the context of weakly supervised learning, partial label learning (PLL) addresses situations where each training instance is associated with a set of partial labels, with only one being accurate. However, in complex realworld tasks, the restrictive assumption may be invalid which means the ground-truth may be outside the candidate label set. In this work, we loose the constraints and address the noisy label problem for PLL. First, we introduce a selection strategy, which enables deep models to select clean samples via the loss values of flipped and original images. Besides, we progressively identify the true labels of the selected samples and ensemble two models to acquire the knowledge of unselected samples. To extract better feature representations, we introduce pseudo-labeled interactive contrastive learning to aggregate cross-network information of all samples. Experimental results verify that our approach surpasses baseline methods on noisy PLL task with different levels of label noise.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111681"},"PeriodicalIF":7.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-domain person re-identification via learning Heterogeneous Pseudo Labels","authors":"Zhong Zhang, Di He, Shuang Liu","doi":"10.1016/j.patcog.2025.111702","DOIUrl":"10.1016/j.patcog.2025.111702","url":null,"abstract":"<div><div>Assigning pseudo labels is vital for cross-domain person re-identification (ReID), and most existing methods only assign one kind of pseudo labels to unlabeled target domain samples, which cannot describe these unlabeled samples accurately due to large intra-class and small inter-class variances caused by diverse environmental factors, such as occlusions, illuminations, viewpoints, and poses, etc. In this paper, we propose a novel label learning method named Heterogeneous Pseudo Labels (HPL) for cross-domain person ReID, which could overcome large intra-class and small inter-class variances between pedestrian images in the target domain. For each unlabeled target domain sample, HPL simultaneously learns three different kinds of pseudo labels, i.e., fine-grained labels, coarse-grained labels, and instance labels. With the three kinds of labels, we could make full use of their own advantages to describe target domain samples from different perspectives. Meanwhile, we propose the Pseudo Labels Constraint (PLC) to improve the quality of the heterogeneous labels by using their consistency. Furthermore, in order to relieve the influence of noisy labels from the aspect of contrastive learning, we propose the Confidence Contrastive Loss (CCL) to consider the sample confidence in the learning process. Extensive experiments on four cross-domain tasks demonstrate that the proposed method achieves a new state-of-the-art performance, for example, the proposed method achieves 87.2% mAP and 95.0% Rank-1 accuracy on MSMT17<span><math><mo>→</mo></math></span>Market.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111702"},"PeriodicalIF":7.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen CT imaging","authors":"Qikui Zhu , Shaoming Zhu , Bo Du , Yanqing Wang","doi":"10.1016/j.patcog.2025.111695","DOIUrl":"10.1016/j.patcog.2025.111695","url":null,"abstract":"<div><div>Synthesizing contrast-enhanced CT imaging (CE-CT imaging) from non-contrast CT imaging (NC-CT) without the need for chemical contrast agents (CAs) injection holds significant clinical value, as CE-CT imaging plays a crucial role in diagnosing liver tumors, especially in identifying and distinguishing benign from malignant liver tumors. However, challenges within CT imaging, such as the low variability in intensity distribution and limited distribution changes, have hindered the effectiveness of existing synthetic methods, including GAN-based methods and diffusion model (DM)-based methods, in synthesizing CE-CT imaging. We propose a novel cross-domain distribution adversarial diffusion model (AdverDM) for CE-CT imaging synthesis, which overcomes the aforementioned challenges and facilitates the synthesis of CE-CT imaging. Our AdverDM incorporates three key innovations: (1) Cross-domain distribution adversarial learning is introduced into DM, enabling the utilization of cross-domain information to learn discriminative feature representations, addressing the limitations of existing DM based methods in capturing conceptually-aware discriminative features and extracting CA-aware feature representations. (2) A content-oriented diffusion model is creatively designed to guide tissue distribution learning, assisting DM in overcoming the challenge of low variability in intensity distribution. (3) A novel structure preservation loss is proposed to maintain the structural information, avoiding the problem of structural destruction faced by DMs. AdverDM is validated using corresponding two-modality CT images (pre-contrast and portal-venous phases), which is a clinically important procedure that benefits liver tumor biopsy. Experimental results (PSNR: 24.78, SSIM: 0.83, MAE: 6.94) demonstrate that our AdverDM successfully synthesizes CE-CT imaging without the need for chemical CAs injection. Moreover, AdverDM’s performance surpasses that of state-of-the-art synthetic methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111695"},"PeriodicalIF":7.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoqing Zhang , Shichao Kan , Lu Shi , Wanru Xu , Gaoyun An , Yigang Cen
{"title":"Cross-scene visual context parsing with large vision-language model","authors":"Guoqing Zhang , Shichao Kan , Lu Shi , Wanru Xu , Gaoyun An , Yigang Cen","doi":"10.1016/j.patcog.2025.111641","DOIUrl":"10.1016/j.patcog.2025.111641","url":null,"abstract":"<div><div>Relation analysis is crucial for image-based applications such as visual reasoning and visual question answering. Current relation analysis such as scene graph generation (SGG) only focuses on building relationships among objects within a single image. However, in real-world applications, relationships among objects across multiple images, as seen in video understanding, may hold greater significance as they can capture global information. This is still a challenging and unexplored task. In this paper, we aim to explore the technique of Cross-Scene Visual Context Parsing (CS-VCP) using a large vision-language model. To achieve this, we first introduce a cross-scene dataset comprising 10,000 pairs of cross-scene visual instruction data, with each instruction describing the common knowledge of a pair of cross-scene images. We then propose a Cross-Scene Visual Symbiotic Linkage (CS-VSL) model to understand both cross-scene relationships and objects by analyzing the rationales in each scene. The model is pre-trained on 100,000 cross-scene image pairs and validated on 10,000 image pairs. Both quantitative and qualitative experiments demonstrate the effectiveness of the proposed method. Our method has been released on GitHub: <span><span>https://github.com/gavin-gqzhang/CS-VSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111641"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Francesco Abate, Lucia Cascone, Michele Nappi
{"title":"Ricci curvature discretizations for head pose estimation from a single image","authors":"Andrea Francesco Abate, Lucia Cascone, Michele Nappi","doi":"10.1016/j.patcog.2025.111648","DOIUrl":"10.1016/j.patcog.2025.111648","url":null,"abstract":"<div><div>Head pose estimation (HPE) is crucial in various real-world applications, like human–computer interaction and biometric framework enhancement. This research aims to leverage network curvature to predict head pose from a single image. In networks, certain groups of nodes fulfill significant functional roles. This study focuses on the interactions of facial landmarks, considered as vertices in a weighted graph. The experiments demonstrate that the underlying graph geometry and topology enable the detection of similarities among various head poses. Two independent notions of discrete Ricci curvature for graphs, namely Ollivier–Ricci and Forman–Ricci curvatures, are investigated. These two types of Ricci curvature, each reflecting distinct geometric properties of the network, serve as inputs to the regression model. The results from the BIWI, AFLW2000, and Pointing‘04 datasets reveal that the two discretizations of Ricci’s curvature are closely related and outperform state-of-the-art methods, including both landmark-based and image-only approaches. This demonstrates the effectiveness and promise of using network curvature for HPE in diverse applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111648"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Alcover-Couso , Marcos Escudero-Viñolo, Juan C. SanMiguel, Jesus Bescos
{"title":"Gradient-based class weighting for unsupervised domain adaptation in dense prediction visual tasks","authors":"Roberto Alcover-Couso , Marcos Escudero-Viñolo, Juan C. SanMiguel, Jesus Bescos","doi":"10.1016/j.patcog.2025.111633","DOIUrl":"10.1016/j.patcog.2025.111633","url":null,"abstract":"<div><div>In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imbalanced dense prediction visual tasks like semantic segmentation. This discrepancy becomes especially pronounced due to the lack of equivalent priors between the source and target domains, turning class imbalanced techniques used for other areas (e.g., image classification) ineffective in UDA scenarios. This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses, with the novelty of estimating these weights dynamically through the gradients of the per-class losses, defining a Gradient-based class weighting (GBW) approach. The proposed GBW naturally increases the contribution of classes whose learning is hindered by highly-represented classes, and has the advantage of automatically adapting to training outcomes, avoiding explicit curricular learning patterns common in loss-weighing strategies. Extensive experimentation validates the effectiveness of GBW across architectures (Convolutional and Transformer), UDA strategies (adversarial, self-training and entropy minimization), tasks (semantic and panoptic segmentation), and datasets. Analysis shows that GBW consistently increases the recall of under-represented classes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111633"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prashant W. Patil , Santosh Nagnath Randive , Sunil Gupta , Santu Rana , Svetha Venkatesh , Subrahmanyam Murala
{"title":"Unpaired recurrent learning for real-world video de-hazing","authors":"Prashant W. Patil , Santosh Nagnath Randive , Sunil Gupta , Santu Rana , Svetha Venkatesh , Subrahmanyam Murala","doi":"10.1016/j.patcog.2025.111698","DOIUrl":"10.1016/j.patcog.2025.111698","url":null,"abstract":"<div><div>Automated outdoor vision-based applications have become increasingly in demand for day-to-day life. Bad weather like haze, rain, snow, <em>etc.</em> may limit the reliability of these applications due to degradation in the overall video quality. So, there is a dire need to pre-process the weather-degraded videos before they are fed to downstream applications. Researchers generally adopt synthetically generated paired hazy frames for learning the task of video de-hazing. The models trained solely on synthetic data may have limited performance on different types of real-world hazy scenarios due to significant domain gap between synthetic and real-world hazy videos. One possible solution is to prove the generalization ability by training on unpaired data for video de-hazing. Some unpaired learning approaches are proposed for single image de-hazing. However, these unpaired single image de-hazing approaches compromise the performance in terms of temporal consistency, which is important for video de-hazing tasks. With this motivation, we have proposed a lightweight and temporally consistent architecture for video de-hazing tasks. To achieve this, diverse receptive and multi-scale features at various input resolutions are mixed and aggregated with multi-kernel attention to extract significant haze information. Furthermore, we propose a recurrent multi-attentive feature alignment concept to maintain temporal consistency with recurrent feedback of previously restored frames for temporal consistent video restoration. Comprehensive experiments are conducted on real-world and synthetic video databases (REVIDE and RSA100Haze). Both the qualitative and quantitative results show significant improvement of the proposed network with better temporal consistency over state-of-the-art methods for detailed video restoration in hazy weather. Source code is available at: <span><span>https://github.com/pwp1208/UnpairedVideoDehazing</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111698"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conformal e-prediction","authors":"Vladimir Vovk","doi":"10.1016/j.patcog.2025.111674","DOIUrl":"10.1016/j.patcog.2025.111674","url":null,"abstract":"<div><div>This paper discusses a counterpart of conformal prediction for e-values, <em>conformal e-prediction</em>. Conformal e-prediction is conceptually simpler and had been developed in the 1990s as a precursor of conformal prediction. When conformal prediction emerged as result of replacing e-values by p-values, it seemed to have important advantages over conformal e-prediction without obvious disadvantages. This paper re-examines relations between conformal prediction and conformal e-prediction systematically from a modern perspective. Conformal e-prediction has advantages of its own, such as the ease of designing conditional conformal e-predictors and the guaranteed validity of cross-conformal e-predictors (whereas for cross-conformal predictors validity is only an empirical fact and can be broken with excessive randomization). Even where conformal prediction has clear advantages, conformal e-prediction can often emulate those advantages, more or less successfully.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111674"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}