Xinyan Liu , Guorong Li , Yuankai Qi , Ziheng Yan , Weigang Zhang , Laiyun Qing , Qingming Huang
{"title":"Dynamic example network for class-agnostic object counting","authors":"Xinyan Liu , Guorong Li , Yuankai Qi , Ziheng Yan , Weigang Zhang , Laiyun Qing , Qingming Huang","doi":"10.1016/j.patcog.2025.111998","DOIUrl":"10.1016/j.patcog.2025.111998","url":null,"abstract":"<div><div>This work addresses the class-agnostic counting and localization task, a critical challenge in computer vision where the goal is to count and locate objects of any category in an image using a few annotated examples. The primary challenge arises from the limited information on appearance due to the lack of diverse examples, which hampers the model’s ability to generalize to varied object appearances. To tackle this issue, we propose a dynamic example network (DEN), consisting of a Location and Example Decoder module (LEDM) designed to incrementally expand the set of examples and refine predictions through multiple iterations. Additionally, our negative example mining strategy identifies informative negative examples across the entire dataset, further improving the model’s discriminative capacity. Extensive experiments on five datasets—FSC-147, FSCD-LVIS, CARPARK, UAVCC, and Visdrone—demonstrate the effectiveness of our approach, showing marked improvements over several state-of-the-art methods. The source code and trained models will be publicly accessible to facilitate further research and application in the field.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111998"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengjia Wang , Fang Liu , Licheng Jiao , Shuo Li , Lingling Li , Puhua Chen , Xu Liu , Wenping Ma
{"title":"VCGPrompt: Visual Concept Graph-Aware Prompt Learning for Vision-Language Models","authors":"Mengjia Wang , Fang Liu , Licheng Jiao , Shuo Li , Lingling Li , Puhua Chen , Xu Liu , Wenping Ma","doi":"10.1016/j.patcog.2025.112012","DOIUrl":"10.1016/j.patcog.2025.112012","url":null,"abstract":"<div><div>Prompt learning enables efficient fine-tuning of visual-language models (VLMs) like CLIP, demonstrating strong transferability across varied downstream tasks. However, adapting VLMs to open-vocabulary tasks is challenging due to the requirement to recognize diverse unseen data, which can cause overfitting and hinder generalization. To address this, we propose Visual Concept Graph-Aware Prompt Learning (VCGPrompt), which constructs visual concept graphs and uses fine-grained text prompts to enrich the general world knowledge of the model. Additionally, we introduce the Visual Concept Graph Aggregation Module (VCGAM) to prioritize the most distinctive visual concepts of each category and guide the learning of relevant visual features, which enhances the capability to perceive the open world. Our method achieves consistent improvements across three diverse generalization settings, including base-to-new, cross-dataset, and domain generalization, with performance gains of up to 0.95%. These results demonstrate the robustness and broad applicability of our approach under various scenarios. Detailed ablation studies and analyses validate the necessity of fine-grained prompts in the open-vocabulary setting.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112012"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpIRL: Spatially-aware image representation learning under the supervision of relative position descriptors","authors":"Logan Servant , Michaël Clément , Laurent Wendling , Camille Kurtz","doi":"10.1016/j.patcog.2025.112013","DOIUrl":"10.1016/j.patcog.2025.112013","url":null,"abstract":"<div><div>Extracting good visual representations from image contents is essential for solving many computer vision problems (e.g. image retrieval, object detection, classification). In this context, state-of-the-art approaches are mainly based on learning a representation using a neural network optimized for a given task. The encoders optimized in this way can then be deployed as backbones for various downstream tasks. When the latter involves reasoning about spatial information from the image content (e.g. retrieve similar structured scenes or compare spatial configurations), this may be suboptimal since models like convolutional neural networks struggle to reason about the relative position of objects in images. Previous studies on building hand-crafted spatial representations, thanks to Relative Position Descriptors (RPD), showed they were powerful to discriminate spatial relations between crisp objects, but such spatial descriptors have rarely been integrated into deep neural networks. We propose in this article different strategies embedded in a common framework called SpIRL (SPatially-aware Image Representation Learning) to guide the optimization of encoders to make them learn more spatial information, under the supervision of an RPD and with the help of a novel dataset (44k images) that does not induce learning semantic information. By using these strategies, we aim to help encoders build more spatially-aware representations. Our experimental results showcase that encoders trained under the SpIRL framework can capture accurate information about the spatial configurations of objects in images on two selected downstream tasks and public datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112013"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixuan Li , Yulong Xu , Renwu Sun , Pengnian Wu , Meng Zhang
{"title":"Dynamic selection of Gaussian samples for object detection on drone images via shape sensing","authors":"Yixuan Li , Yulong Xu , Renwu Sun , Pengnian Wu , Meng Zhang","doi":"10.1016/j.patcog.2025.111978","DOIUrl":"10.1016/j.patcog.2025.111978","url":null,"abstract":"<div><div>Label assignment (LA) strategy has been extensively studied as a fundamental issue in object detection. However, the drastic scale changes and wide variations in shape (aspect ratio) of objects in drone images result in a sharp performance drop for general LA strategies. To address the above problems, we propose an adaptive Gaussian sample selection strategy for multi-scale objects via shape sensing. Specifically, we first conduct Gaussian modeling for receptive field priors and ground-truth (gt) boxes, ensuring that the non-zero distance metric between any feature point and any ground truth on the whole image is obtained. Subsequently, we theoretically analyze and show that Kullback–Leibler Divergence (KLD) can measure distance according to the characteristics of the object. Taking advantage of this property, we utilize the statistical characteristics of the top-K highest KLD-based matching scores as the positive sample selection threshold for each gt, thereby assigning adequate high-quality samples to multi-scale objects. More importantly, we introduce an adaptive shape-aware strategy that adjusts the sample quantity according to the aspect ratio of objects, guiding the network to balanced learning for multi-scale objects with various shapes. Extensive experiments show that our dynamic shape-aware LA strategy is applicable to a variety of advanced detectors and achieves consistently improved performances on two major benchmarks (i.e., VisDrone and UAVDT), demonstrating the effectiveness of our approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111978"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144490562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LiteNeRFAvatar: A lightweight NeRF with local feature learning for dynamic human avatar","authors":"Junjun Pan , Xiaoyu Li , Junxuan Bai , Ju Dai","doi":"10.1016/j.patcog.2025.112008","DOIUrl":"10.1016/j.patcog.2025.112008","url":null,"abstract":"<div><div>Creating high-quality dynamic human avatars within acceptable costs remains challenging in computer vision and computer graphics. The neural radiance field (NeRF) has become a fundamental means of generating human avatars due to its success in novel view synthesis. However, the storage-intensive and time-consuming per-scene training due to the transformation and evaluation of massive sampling points constrains its practical applications. In this paper, we introduce a novel lightweight NeRF model, LiteNeRFAvatar, to overcome these limits. To avoid the high-cost backward transformation of the sampling points, LiteNeRFAvatar decomposes the appearance features of clothed humans into multiple local feature spaces and transforms them forward according to human movements. Each local feature space affects a limited local area and is represented by an explicit feature volume created by the tensor decomposition techniques to support fast access. The sampling points retrieve the features based on the relative positions to the local feature spaces. The densities and the colors are then regressed from the aggregated features using a tiny decoder. We also adopt an empty space skipping strategy to further reduce the number of sampling points. Experimental results demonstrate that our LiteNeRFAvatar achieves a satisfactory balance between synthesis quality, training time, rendering speed and parameter size compared to the existing NeRF-based methods. For the demo of our method, please refer to the link on: <span><span>https://youtu.be/UYfreeHtIZY</span><svg><path></path></svg></span>. The source code will be released after the paper is accepted.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112008"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UR2P-Dehaze: Learning a Simple Image Dehaze Enhancer via Unpaired Rich Physical Prior","authors":"Minglong Xue , Shuaibin Fan , Shivakumara Palaiahnakote , Mingliang Zhou","doi":"10.1016/j.patcog.2025.111997","DOIUrl":"10.1016/j.patcog.2025.111997","url":null,"abstract":"<div><div>Image dehazing techniques aim to enhance contrast and restore details, which are essential for preserving visual information and improving image processing accuracy. Existing methods may struggle to capture the physical characteristics of images fully and deeply, which could limit their ability to reveal image details. To overcome this limitation, we propose an unpaired image dehazing network, called the Simple Image Dehaze Enhancer via Unpaired Rich Physical Prior (UR2P-Dehaze). First, to accurately estimate the illumination, reflectance, and color information of the hazy image, we design a Shared Prior Estimator (SPE) that is iteratively trained to ensure the consistency of illumination and reflectance, generating clear, high-quality images. Additionally, a self-monitoring mechanism is introduced to eliminate undesirable features, providing reliable priors for image reconstruction. Next, we propose Dynamic Wavelet Separable Convolution (DWSC), which effectively integrates key features across both low and high frequencies, significantly enhancing the preservation of image details and ensuring global consistency. Finally, to effectively restore the color information of the image, we propose an Adaptive Color Corrector that addresses the problem of unclear colors. The PSNR, SSIM, LPIPS, FID and CIEDE2000 metrics on the benchmark dataset show that our method achieves state-of-the-art performance. It also contributes to the performance improvement of downstream tasks. The project code is available at <span><span>https://github.com/Fan-pixel/UR2P-Dehaze</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111997"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Caudo-Diff: Diffusion calibrated pseudo labels in guided latent space for minimally supervised medical segmentation","authors":"Baoqi Yu, Yong Liu","doi":"10.1016/j.patcog.2025.112007","DOIUrl":"10.1016/j.patcog.2025.112007","url":null,"abstract":"<div><div>Accurate segmentation of medical images is essential for clinical diagnosis and treatment planning. However, deep learning-based segmentation models are data-intensive, requiring large, well-annotated datasets which is an often challenging and costly requirement in medical fields. To reduce the reliance on manual labeling, we propose the minimally supervision based on an exemplar, leveraging only a single labeled sample while making full use of the remaining unlabeled data. In this case, two challenges need to be addressed. First, a lack of sufficient prior information: relying solely on a single exemplar limits the model’s ability to capture complex semantics. Second, the unreliability of pseudo labels: noise and inaccuracies in these labels introduce bias, hindering segmentation performance. To overcome these challenges, we propose a new pseudo-labeling paradigm by diffusion calibration. Follow this paradigm, we introduce Caudo-Diff, a novel method for calibrating pseudo labels using a deterministic diffusion model in a guided latent space, aiming to supplement prior information and improve pseudo-labeling reliability. Initial pseudo labels and features extracted by the segmentation network guide the model to focus on meaningful semantic regions. The pseudo labels are then refined to reduce noise and errors, enhancing segmentation accuracy. Experimental results show that Caudo-Diff improves segmentation performance with minimal supervision, offering a practical solution to the challenge of annotation scarcity in medical image segmentation.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112007"},"PeriodicalIF":7.5,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyu Jiang, Hong Tao, Zhangqi Jiang, Chenping Hou
{"title":"Unaligned multi-view clustering via diversified anchor graph fusion","authors":"Hongyu Jiang, Hong Tao, Zhangqi Jiang, Chenping Hou","doi":"10.1016/j.patcog.2025.111977","DOIUrl":"10.1016/j.patcog.2025.111977","url":null,"abstract":"<div><div>Clear sample correspondence across views is a key presupposition of traditional multi-view clustering. However, in practical applications, uncertainties during the data collection process may lead to the violation of this presupposition, producing unaligned multi-view data. In this paper, to overcome the obstacle of multi-view fusion caused by unaligned samples and achieve efficient unaligned multi-view clustering, a novel Diversified Anchor Graph Fusion (DAGF) method is proposed. Specifically, view-specific bipartite graphs with diversified anchors are constructed to adapt to the characteristics of unaligned multi-view data. Then, with the devised sample alignment and anchor integration strategy, these bipartite graphs are fused to learn a joint bipartite graph with explicit cluster membership structure. The proposed DAGF method not only overcomes the adverse effects of unaligned samples on cross-view information fusion, but also preserves complementary view-specific clustering structure information, enabling efficient and effective clustering. Systematic experimental results on real-world datasets demonstrate the advantages of the DAGF method in both clustering performance and computational complexity. Code available: <span><span>https://github.com/revolution6575/DAGF.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111977"},"PeriodicalIF":7.5,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning embedded label-specific features for partial multi-label learning","authors":"Xiaohan Xu , Hao Wang , Jialu Yao , Zan Zhang","doi":"10.1016/j.patcog.2025.112005","DOIUrl":"10.1016/j.patcog.2025.112005","url":null,"abstract":"<div><div>Partial multi-label learning (PML) aims to learn from instances with weak supervision, where each instance is associated with a set of candidate labels, among which only a subset is valid. Most existing approaches rely on identical feature representations to distinguish all class labels, overlooking the inherent distinctiveness of different labels, which leads to suboptimal model performance. Although recent studies have attempted to address this limitation by tailoring label-specific features, critical shortcomings remain: (1) isolated processing of feature tailoring and label disambiguation fails to leverage their synergistic relationship, and (2) direct extraction of label-specific features from the original feature space tends to yield unreliable results due to inherent noise and disturbances. This paper proposes a unified PML framework that jointly performs label disambiguation, embedded label-specific feature learning, and model induction. Within this framework, identifying ground-truth labels and generating label-specific features mutually reinforce each other, leading to continuous refinement. By customizing features from a compact and noise-free embedded space, the framework further ensures robustness and reliability in learning. Specifically, low-rank and sparse decomposition is employed to separate ground-truth labels from noisy ones, while a linear embedding discriminant model simultaneously generates embedded label-specific features and induces the model. Moreover, we enhance the classifier’s accuracy by assuming that the input and output spaces share local geometric structures, encouraging similar instances to have similar label sets. Extensive experiments on sixty-six real-world and synthetic datasets demonstrate that the proposed approach significantly outperforms state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112005"},"PeriodicalIF":7.5,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Lu , Jie Tan , Shizhou Zhang, Yinghui Xing, Guoqiang Liang, Yanning Zhang
{"title":"Nearest-neighbor class prototype prompt and simulated logits for continual learning","authors":"Yue Lu , Jie Tan , Shizhou Zhang, Yinghui Xing, Guoqiang Liang, Yanning Zhang","doi":"10.1016/j.patcog.2025.111933","DOIUrl":"10.1016/j.patcog.2025.111933","url":null,"abstract":"<div><div>Continual learning allows a single model to acquire knowledge from a sequence of tasks within a non-static data stream without succumbing to catastrophic forgetting. Vision transformers, pre-trained on extensive datasets, have recently made prompt-based methods viable as exemplar-free alternatives to methods reliant on rehearsal. Nonetheless, the majority of these methods employ a key–value query system for integrating pertinent prompts, which might result in the keys becoming stuck in local minima. To counter this, we suggest a straightforward nearest-neighbor class prototype search approach for deducing task labels, which improves the alignment with appropriate prompts. Additionally, we boost task label inference accuracy by embedding prompts within the query function itself, thereby enabling better feature extraction from the samples. To further minimize inter-task confusion in cross-task classification, we incorporate simulated logits into the classifier during training. These logits emulate strong responses from other tasks, aiding in the refinement of the classifier’s decision boundaries. Our method outperforms many existing prompt-based approaches, setting a new state-of-the-art record on three widely-used class-incremental learning datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 111933"},"PeriodicalIF":7.5,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}