Juanying Xie , Huan Yan , Mingzhao Wang , Philip W. Grant , Witold Pedrycz
{"title":"WANN-DPC: Density peaks finding clustering based on Weighted Adaptive Nearest Neighbors","authors":"Juanying Xie , Huan Yan , Mingzhao Wang , Philip W. Grant , Witold Pedrycz","doi":"10.1016/j.patcog.2025.111953","DOIUrl":"10.1016/j.patcog.2025.111953","url":null,"abstract":"<div><div>DPC (Density Peak Clustering) algorithm and most of its variants are unable to identify the cluster centers of dense and sparse clusters simultaneously. In addition, the “Domino Effect” of DPC cannot be entirely avoided in its variants. Despite ANN-DPC (Adaptive Nearest Neighbor DPC) being able to detect cluster centers of dense and sparse clusters, its adaptive nearest neighbors of a point may introduce bias in the local density, cluster centers and clustering. To address these limitations of ANN-DPC, the WANN-DPC (Weighted Adaptive Nearest Neighbor DPC) algorithm is proposed. The key contributions of WANN-DPC are as follows: (1) A novel weighted local density of a point is defined by weighting its close and far neighbors, (2) a correction factor is proposed to detect cluster centers in turn, and (3) a two-step assignment strategy is presented utilizing nearest neighbor relationships and weighted membership degrees. Extensive experiments on benchmark datasets demonstrate the superiority of the WANN-DPC over its peers.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111953"},"PeriodicalIF":7.5,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144535801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning region-aware style-content feature transformations for face image beautification","authors":"Zhen Xu, Si Wu","doi":"10.1016/j.patcog.2025.111861","DOIUrl":"10.1016/j.patcog.2025.111861","url":null,"abstract":"<div><div>As a representative image-to-image translation task, facial makeup transfer is typically performed by applying intermediate feature normalization, conditioned on the style information extracted from a reference image. However, the relevant methods are typically limited in range of applicability, due to that the style information is independent of source images and lack of spatial details. To realize precise makeup transfer and further associate with face component editing, we propose a Semantic Region Style-content Feature Transformation approach, which is referred to as SRSFT. Specifically, we encode both reference and source images into region-wise feature vectors and maps, based on semantic segmentation masks. To address the misalignment in poses and expressions, region-wise spatial transformations are inferred to align the reference and source masks, and are then applied to explicitly warp the reference feature maps to the source face, without any extra supervision. The resulting feature maps are fused with the source ones and inserted into a generator for image synthesis. On the other hand, the reference and source feature vectors are also fused and used to determine the modulation parameters at multiple intermediate layers. SRSFT is able to achieve superior beautification performance in terms of seamlessness and fidelity.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111861"},"PeriodicalIF":7.5,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144518994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Manigrasso, Fabrizio Lamberti, Lia Morra
{"title":"Boosting zero-shot learning through neuro-symbolic integration","authors":"Francesco Manigrasso, Fabrizio Lamberti, Lia Morra","doi":"10.1016/j.patcog.2025.111869","DOIUrl":"10.1016/j.patcog.2025.111869","url":null,"abstract":"<div><div>Zero-shot learning (ZSL) aims to train deep neural networks to recognize objects from unseen classes, starting from a semantic description of the concepts. Neuro-symbolic (NeSy) integration refers to a class of techniques that incorporate symbolic knowledge representation and reasoning with the learning capabilities of deep neural networks. However, to date, few studies have explored how to leverage NeSy techniques to inject prior knowledge during the training process to boost ZSL capabilities. Here, we present Fuzzy Logic Prototypical Network (FLPN) that formulates the classification task as prototype matching in a visual-semantic embedding space, which is trained by optimizing a NeSy loss. Specifically, FLPN exploits the Logic Tensor Network (LTN) framework to incorporate background knowledge in the form of logical axioms by grounding a first-order logic language as differentiable operations between real tensors. This prior knowledge includes class hierarchies (classes and macroclasses) along with robust high-level inductive biases. The latter allow, for instance, to handle exceptions in class-level attributes and to enforce similarity between images of the same class, preventing premature overfitting to seen classes and improving overall performance. Both class-level and attribute-level prototypes through an attention mechanism specialized for either convolutional- or transformer-based backbones. FLPN achieves state-of-the-art performance on the GZSL benchmarks AWA2 and SUN, matching or exceeding the performance of competing algorithms with minimal computational overhead. The code is available at <span><span>https://github.com/FrancescoManigrass/FLPN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111869"},"PeriodicalIF":7.5,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongcheng Li , Lingcong Cai , Ying Lu , Cheng Lin , Yupeng Zhang , Jingyan Jiang , Genan Dai , Bowen Zhang , Jingzhou Cao , Xiangzhong Zhang , Xiaomao Fan
{"title":"Domain-invariant representation learning via SAM for blood cell classification","authors":"Yongcheng Li , Lingcong Cai , Ying Lu , Cheng Lin , Yupeng Zhang , Jingyan Jiang , Genan Dai , Bowen Zhang , Jingzhou Cao , Xiangzhong Zhang , Xiaomao Fan","doi":"10.1016/j.patcog.2025.112000","DOIUrl":"10.1016/j.patcog.2025.112000","url":null,"abstract":"<div><div>Accurate classification of blood cells is of vital significance in the diagnosis of hematological disorders, facilitating timely treatments for patients. However, in real-world scenarios, domain shifts caused by the variability in laboratory procedures and settings often result in rapid deterioration in model generalization performance. To address this issue, we propose a novel domain-invariant representation learning via the Segment Anything Model (SAM) for blood cell classification, referred to as DoRL. The DoRL comprises two main components: a LoRA-based SAM (LoRA-SAM) and a cross-domain autoencoder (CAE). The key advantage of DoRL is the ability to extract domain-invariant representations from various blood cell datasets in an unsupervised manner. Specifically, we first leverage the large-scale foundation model SAM, fine-tuned with LoRA, to generate robust and transferable visual representations of blood cells. Furthermore, we introduce the CAE to learn domain-invariant representations from the image embeddings across different-domain datasets. The CAE mitigates the impact of image artifacts and other domain-specific variations, ensuring the learned representations more generalizable. To validate the effectiveness of domain-invariant representations, we employ five widely used machine learning classifiers to construct blood cell classification models. Experimental results on two public blood cell datasets and a private real-world dataset demonstrate that our proposed DoRL achieves a new state-of-the-art cross-domain performance, surpassing existing methods by a significant margin. The DoRL, with its novel integration of LoRA-SAM and cross-domain autoencoding, provides a robust and effective solution for enhancing the generalization capabilities of blood cell classification models, potentially improving patient care and outcomes. The source code can be available at <span><span>https://github.com/AnoK3111/DoRL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112000"},"PeriodicalIF":7.5,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyan Liu , Guorong Li , Yuankai Qi , Ziheng Yan , Weigang Zhang , Laiyun Qing , Qingming Huang
{"title":"Dynamic example network for class-agnostic object counting","authors":"Xinyan Liu , Guorong Li , Yuankai Qi , Ziheng Yan , Weigang Zhang , Laiyun Qing , Qingming Huang","doi":"10.1016/j.patcog.2025.111998","DOIUrl":"10.1016/j.patcog.2025.111998","url":null,"abstract":"<div><div>This work addresses the class-agnostic counting and localization task, a critical challenge in computer vision where the goal is to count and locate objects of any category in an image using a few annotated examples. The primary challenge arises from the limited information on appearance due to the lack of diverse examples, which hampers the model’s ability to generalize to varied object appearances. To tackle this issue, we propose a dynamic example network (DEN), consisting of a Location and Example Decoder module (LEDM) designed to incrementally expand the set of examples and refine predictions through multiple iterations. Additionally, our negative example mining strategy identifies informative negative examples across the entire dataset, further improving the model’s discriminative capacity. Extensive experiments on five datasets—FSC-147, FSCD-LVIS, CARPARK, UAVCC, and Visdrone—demonstrate the effectiveness of our approach, showing marked improvements over several state-of-the-art methods. The source code and trained models will be publicly accessible to facilitate further research and application in the field.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111998"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengjia Wang , Fang Liu , Licheng Jiao , Shuo Li , Lingling Li , Puhua Chen , Xu Liu , Wenping Ma
{"title":"VCGPrompt: Visual Concept Graph-Aware Prompt Learning for Vision-Language Models","authors":"Mengjia Wang , Fang Liu , Licheng Jiao , Shuo Li , Lingling Li , Puhua Chen , Xu Liu , Wenping Ma","doi":"10.1016/j.patcog.2025.112012","DOIUrl":"10.1016/j.patcog.2025.112012","url":null,"abstract":"<div><div>Prompt learning enables efficient fine-tuning of visual-language models (VLMs) like CLIP, demonstrating strong transferability across varied downstream tasks. However, adapting VLMs to open-vocabulary tasks is challenging due to the requirement to recognize diverse unseen data, which can cause overfitting and hinder generalization. To address this, we propose Visual Concept Graph-Aware Prompt Learning (VCGPrompt), which constructs visual concept graphs and uses fine-grained text prompts to enrich the general world knowledge of the model. Additionally, we introduce the Visual Concept Graph Aggregation Module (VCGAM) to prioritize the most distinctive visual concepts of each category and guide the learning of relevant visual features, which enhances the capability to perceive the open world. Our method achieves consistent improvements across three diverse generalization settings, including base-to-new, cross-dataset, and domain generalization, with performance gains of up to 0.95%. These results demonstrate the robustness and broad applicability of our approach under various scenarios. Detailed ablation studies and analyses validate the necessity of fine-grained prompts in the open-vocabulary setting.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112012"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpIRL: Spatially-aware image representation learning under the supervision of relative position descriptors","authors":"Logan Servant , Michaël Clément , Laurent Wendling , Camille Kurtz","doi":"10.1016/j.patcog.2025.112013","DOIUrl":"10.1016/j.patcog.2025.112013","url":null,"abstract":"<div><div>Extracting good visual representations from image contents is essential for solving many computer vision problems (e.g. image retrieval, object detection, classification). In this context, state-of-the-art approaches are mainly based on learning a representation using a neural network optimized for a given task. The encoders optimized in this way can then be deployed as backbones for various downstream tasks. When the latter involves reasoning about spatial information from the image content (e.g. retrieve similar structured scenes or compare spatial configurations), this may be suboptimal since models like convolutional neural networks struggle to reason about the relative position of objects in images. Previous studies on building hand-crafted spatial representations, thanks to Relative Position Descriptors (RPD), showed they were powerful to discriminate spatial relations between crisp objects, but such spatial descriptors have rarely been integrated into deep neural networks. We propose in this article different strategies embedded in a common framework called SpIRL (SPatially-aware Image Representation Learning) to guide the optimization of encoders to make them learn more spatial information, under the supervision of an RPD and with the help of a novel dataset (44k images) that does not induce learning semantic information. By using these strategies, we aim to help encoders build more spatially-aware representations. Our experimental results showcase that encoders trained under the SpIRL framework can capture accurate information about the spatial configurations of objects in images on two selected downstream tasks and public datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112013"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixuan Li , Yulong Xu , Renwu Sun , Pengnian Wu , Meng Zhang
{"title":"Dynamic selection of Gaussian samples for object detection on drone images via shape sensing","authors":"Yixuan Li , Yulong Xu , Renwu Sun , Pengnian Wu , Meng Zhang","doi":"10.1016/j.patcog.2025.111978","DOIUrl":"10.1016/j.patcog.2025.111978","url":null,"abstract":"<div><div>Label assignment (LA) strategy has been extensively studied as a fundamental issue in object detection. However, the drastic scale changes and wide variations in shape (aspect ratio) of objects in drone images result in a sharp performance drop for general LA strategies. To address the above problems, we propose an adaptive Gaussian sample selection strategy for multi-scale objects via shape sensing. Specifically, we first conduct Gaussian modeling for receptive field priors and ground-truth (gt) boxes, ensuring that the non-zero distance metric between any feature point and any ground truth on the whole image is obtained. Subsequently, we theoretically analyze and show that Kullback–Leibler Divergence (KLD) can measure distance according to the characteristics of the object. Taking advantage of this property, we utilize the statistical characteristics of the top-K highest KLD-based matching scores as the positive sample selection threshold for each gt, thereby assigning adequate high-quality samples to multi-scale objects. More importantly, we introduce an adaptive shape-aware strategy that adjusts the sample quantity according to the aspect ratio of objects, guiding the network to balanced learning for multi-scale objects with various shapes. Extensive experiments show that our dynamic shape-aware LA strategy is applicable to a variety of advanced detectors and achieves consistently improved performances on two major benchmarks (i.e., VisDrone and UAVDT), demonstrating the effectiveness of our approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111978"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LiteNeRFAvatar: A lightweight NeRF with local feature learning for dynamic human avatar","authors":"Junjun Pan , Xiaoyu Li , Junxuan Bai , Ju Dai","doi":"10.1016/j.patcog.2025.112008","DOIUrl":"10.1016/j.patcog.2025.112008","url":null,"abstract":"<div><div>Creating high-quality dynamic human avatars within acceptable costs remains challenging in computer vision and computer graphics. The neural radiance field (NeRF) has become a fundamental means of generating human avatars due to its success in novel view synthesis. However, the storage-intensive and time-consuming per-scene training due to the transformation and evaluation of massive sampling points constrains its practical applications. In this paper, we introduce a novel lightweight NeRF model, LiteNeRFAvatar, to overcome these limits. To avoid the high-cost backward transformation of the sampling points, LiteNeRFAvatar decomposes the appearance features of clothed humans into multiple local feature spaces and transforms them forward according to human movements. Each local feature space affects a limited local area and is represented by an explicit feature volume created by the tensor decomposition techniques to support fast access. The sampling points retrieve the features based on the relative positions to the local feature spaces. The densities and the colors are then regressed from the aggregated features using a tiny decoder. We also adopt an empty space skipping strategy to further reduce the number of sampling points. Experimental results demonstrate that our LiteNeRFAvatar achieves a satisfactory balance between synthesis quality, training time, rendering speed and parameter size compared to the existing NeRF-based methods. For the demo of our method, please refer to the link on: <span><span>https://youtu.be/UYfreeHtIZY</span><svg><path></path></svg></span>. The source code will be released after the paper is accepted.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112008"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UR2P-Dehaze: Learning a Simple Image Dehaze Enhancer via Unpaired Rich Physical Prior","authors":"Minglong Xue , Shuaibin Fan , Shivakumara Palaiahnakote , Mingliang Zhou","doi":"10.1016/j.patcog.2025.111997","DOIUrl":"10.1016/j.patcog.2025.111997","url":null,"abstract":"<div><div>Image dehazing techniques aim to enhance contrast and restore details, which are essential for preserving visual information and improving image processing accuracy. Existing methods may struggle to capture the physical characteristics of images fully and deeply, which could limit their ability to reveal image details. To overcome this limitation, we propose an unpaired image dehazing network, called the Simple Image Dehaze Enhancer via Unpaired Rich Physical Prior (UR2P-Dehaze). First, to accurately estimate the illumination, reflectance, and color information of the hazy image, we design a Shared Prior Estimator (SPE) that is iteratively trained to ensure the consistency of illumination and reflectance, generating clear, high-quality images. Additionally, a self-monitoring mechanism is introduced to eliminate undesirable features, providing reliable priors for image reconstruction. Next, we propose Dynamic Wavelet Separable Convolution (DWSC), which effectively integrates key features across both low and high frequencies, significantly enhancing the preservation of image details and ensuring global consistency. Finally, to effectively restore the color information of the image, we propose an Adaptive Color Corrector that addresses the problem of unclear colors. The PSNR, SSIM, LPIPS, FID and CIEDE2000 metrics on the benchmark dataset show that our method achieves state-of-the-art performance. It also contributes to the performance improvement of downstream tasks. The project code is available at <span><span>https://github.com/Fan-pixel/UR2P-Dehaze</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111997"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}