Pattern Recognition最新文献_第10页

Learning to complement with multiple humans 学会与不同的人互补

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-09-01 DOI: 10.1016/j.patcog.2025.112376

Zheng Zhang , Cuong Nguyen , Kevin Wells , Thanh-Toan Do , Gustavo Carneiro

{"title":"Learning to complement with multiple humans","authors":"Zheng Zhang , Cuong Nguyen , Kevin Wells , Thanh-Toan Do , Gustavo Carneiro","doi":"10.1016/j.patcog.2025.112376","DOIUrl":"10.1016/j.patcog.2025.112376","url":null,"abstract":"<div><div>Solution for addressing real-world image classification challenges. Human-AI collaborative classification (HAI-CC) aims to synergise the efficiency of machine learning classifiers and the reliability of human experts to support decision making. <em>Learning to defer</em> (L2D) has been one of the promising HAI-CC approaches, where the system assesses a sample and decides to defer to one of human experts when it is not confident. Despite recent progress, existing L2D methods rely on the strong assumption of ground truth label availability for training, while in practice, most datasets often contain multiple noisy annotations per data sample without well-curated ground truth labels. In addition, current L2D methods either consider the setting of a single human expert or defer the decision to one human expert, even though there may be multiple experts available, resulting in a suboptimal utilisation of available resources. Furthermore, current HAI-CC evaluation frameworks often overlook processing costs, making it difficult to assess the trade-off between computational efficiency and performance when benchmarking different methods. To address these gaps, this paper introduces LECOMH – a new HAI-CC method that learns from noisy labels without depending on clean labels for training, simultaneously maximising collaborative accuracy with either one or multiple human experts, while minimising the cost of human collaboration. The paper also introduces benchmarks featuring multiple noisy labels per data sample for both training and testing to evaluate HAI-CC methods. Through quantitative comparisons on these benchmarks, LECOMH consistently outperforms HAI-CC methods and baselines, including human experts alone, multi-rater learning and noisy-label learning methods across both synthetic and real-world datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112376"},"PeriodicalIF":7.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145020174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vision-by-prompt: Context-aware dual prompts for composed video retrieval 视觉提示：上下文感知双提示组合视频检索

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-09-01 DOI: 10.1016/j.patcog.2025.112378

Hao Wang , Fang Liu , Licheng Jiao , Jiahao Wang , Shuo Li , Lingling Li , Puhua Chen , Xu Liu

{"title":"Vision-by-prompt: Context-aware dual prompts for composed video retrieval","authors":"Hao Wang , Fang Liu , Licheng Jiao , Jiahao Wang , Shuo Li , Lingling Li , Puhua Chen , Xu Liu","doi":"10.1016/j.patcog.2025.112378","DOIUrl":"10.1016/j.patcog.2025.112378","url":null,"abstract":"<div><div>Composed video retrieval (CoVR) is a challenging task of retrieving relevant videos in a corpus by using a query that integrates both a relative change text and a reference video. Most existing CoVR models simply rely on the late-fusion strategy to combine visual and change text. Furthermore, various methods have been proposed to generate pseudo-word tokens from the reference video, which are then integrated into the relative change text for CoVR. However, these pseudo-word-based techniques exhibit limitations when the target video involves complex changes from the reference video, <em>e.g.</em>, object removal. In this work, we propose a novel CoVR framework that learns context information via context-aware dual prompts for relative change text to achieve effective composed video retrieval. The dual prompts cater to two aspects: 1) Global descriptive prompts generated from the pretrained V-L models, <em>e.g.</em>, BLIP-2, to get concise textual representations of the reference video. 2) Local target prompts to learn the target representations that the change text pays attention to. By connecting these prompts with relative change text, one can easily use existing text-to-video retrieval models to enhance CoVR performance. Our proposed framework can be flexibly used for both composed video retrieval (CoVR) and composed image retrieval (CoIR) tasks. Moreover, we take a pioneering approach by adopting the CoVR model to achieve zero-shot CoIR for remote sensing. Experiments on four datasets show that our approach achieves state-of-the-art performance in both CoVR and zero-shot CoIR tasks, with improvements of as high as around 3.5 % in terms of recall@K=1 score.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112378"},"PeriodicalIF":7.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145009875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MCoCa: Towards fine-grained multimodal control in image captioning 在图像字幕中实现细粒度多模态控制

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-09-01 DOI: 10.1016/j.patcog.2025.112381

Shanshan Zhao , Teng Wang , Jinrui Zhang , Xiangchen Wang , Feng Zheng

{"title":"MCoCa: Towards fine-grained multimodal control in image captioning","authors":"Shanshan Zhao , Teng Wang , Jinrui Zhang , Xiangchen Wang , Feng Zheng","doi":"10.1016/j.patcog.2025.112381","DOIUrl":"10.1016/j.patcog.2025.112381","url":null,"abstract":"<div><div>Controllable image captioning (CIC) models have traditionally focused on generating controlled descriptions using specific text styles. However, these approaches are limited as they rely solely on text control signals, which often fail to align with complex human intentions, such as selecting specific areas in images. To enhance multimodal interactivity, we propose to augment current CIC systems with diverse and joint visual-text controls. To achieve this, we first create a comprehensive Multimodal Controllable Image Captioning Corpus (MCoCa) dataset by leveraging language rewriting ability of GPT-3.5, containing 0.97M image-captions pairs along with 21 visual-text control signals. By training the visual and textual adapters equipped on the multimodal large language model with newly proposed instructional prompts on MCoCa, we observe emergent combinatory multimodal controllability and significant improvement in text controllability. We present exhaustive quantitative and qualitative results, benchmarking our trained model’s state-of-the-art zero-shot captioning performance on SentiCap and FlickrStyle10K in terms of both fidelity and controllability. For regional understanding ability of visual-controlled captioning, our method achieves obvious improvement compared with the baseline models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112381"},"PeriodicalIF":7.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144997400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A variable gaussian kernel scale active contour model based on Jeffreys divergence for ICT image segmentation 基于Jeffreys散度的可变高斯核尺度活动轮廓模型用于ICT图像分割

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-09-01 DOI: 10.1016/j.patcog.2025.112384

Zexin Liu , Qi Li , Junyao Wang , Tingyuan Deng , Rifeng Zhou , Yufang Cai , Fenglin Liu

{"title":"A variable gaussian kernel scale active contour model based on Jeffreys divergence for ICT image segmentation","authors":"Zexin Liu , Qi Li , Junyao Wang , Tingyuan Deng , Rifeng Zhou , Yufang Cai , Fenglin Liu","doi":"10.1016/j.patcog.2025.112384","DOIUrl":"10.1016/j.patcog.2025.112384","url":null,"abstract":"<div><div>In industrial computed tomography (ICT), factors like beam scattering, insufficient beam intensity, and detector dark current often lead to weak edges, scattering artifacts, and severe Gaussian noise in ICT images. These issues pose significant difficulties for accurate segmentation of high-density complex structures using existing active contour models (ACMs). To address these limitations, this paper presents a variable Gaussian kernel scale active contour model based on Jeffreys divergence (VGJD). Firstly, the Jeffreys divergence (JD) is incorporated into the energy function to replace the conventional Euclidean distance, enhancing the contour’s ability to quantify pixel value disparity during evolution. Additionally, a filter weight is introduced to minimize the impact of noise. Moreover, a variable Gaussian kernel scale strategy is adopted to effectively integrate both global and local image information, thereby enhancing the robustness of the initial contour and improving the precision of detail segmentation. Finally, optimized length and regularity terms are employed to enforce constraints on the level set function. Extensive experimental results demonstrate that the VGJD model can effectively segment various complex ICT images, achieving superior precision in comparison to other ACM models. The code is available at <span><span>https://github.com/LiuZX599/ACM-VGJD.git</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112384"},"PeriodicalIF":7.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145009876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust spatio-temporal graph neural networks with sparse structure learning 基于稀疏结构学习的鲁棒时空图神经网络

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-08-31 DOI: 10.1016/j.patcog.2025.112383

Yupei Zhang , Yuxin Li , Shuhui Liu , Xuequn Shang

{"title":"Robust spatio-temporal graph neural networks with sparse structure learning","authors":"Yupei Zhang , Yuxin Li , Shuhui Liu , Xuequn Shang","doi":"10.1016/j.patcog.2025.112383","DOIUrl":"10.1016/j.patcog.2025.112383","url":null,"abstract":"<div><div>This paper focuses on the problem of spatio-temporal graph classification by introducing sparse structure learning to enhance its robustness and explainability. Spatio-temporal graph neural networks (STGNN) integrate spatial structure and temporal sequential features into GNN learning, resulting in promising performance in many applications. However, current STGNN models often fail to capture the discriminative sparse substructure and the smooth distribution of these samples. To this end, this paper introduces RostGNN, robust spatio-temporal graph neural networks, for achieving more discriminative graph representations. Concretely, RostGNN extracts the spatial and temporal features by performing gated recurrent units on the given time series data and calculating adjacent matrixes for graphs. Then, we impose the iterative hard-thresholding approach on the final association matrix to obtain a sparse graph. Meanwhile, we calculate a similarity matrix from the side information of samples to smooth the achieved data representations and use fully connected networks for graph classification. We finally applied RostGNN to brain graph classification in experiments on real-world datasets. The results demonstrate that RostGNN delivers robust and discriminative graph representations and performs better than compared methods, benefiting from the sparsity and manifold regularizers. Furthermore, RostGNN can potentially yield useful findings for data understanding.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112383"},"PeriodicalIF":7.6,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144997396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-channel blur invariants of color and multispectral images 彩色和多光谱图像的跨通道模糊不变量

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-08-31 DOI: 10.1016/j.patcog.2025.112358

Václav Košík, Jan Flusser, Filip Šroubek

引用次数: 0

Ultra-efficient 3D shape reconstruction: Line-coded absolute phase unwrapping algorithm 超高效三维形状重建：线编码绝对相位展开算法

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-08-31 DOI: 10.1016/j.patcog.2025.112366

Haihua An , Yiping Cao , Hechen Zhang

{"title":"Ultra-efficient 3D shape reconstruction: Line-coded absolute phase unwrapping algorithm","authors":"Haihua An , Yiping Cao , Hechen Zhang","doi":"10.1016/j.patcog.2025.112366","DOIUrl":"10.1016/j.patcog.2025.112366","url":null,"abstract":"<div><div>Absolute phase unwrapping-based fringe projection profilometry (APU-FPP) has the advantages of pixel-wise calculation, high precision, and full-field sensing of 3D shape information. To the best of our knowledge, existing APU-FPP methods have a general contradiction between accuracy and efficiency because of projecting extra auxiliary coded fringes (ACFs). In this paper, a line-coded absolute phase unwrapping (LCAPU) algorithm is presented for absolute 3D shape reconstruction of the scene with non-uniform reflectivity and complex surfaces. Firstly, a sequence of single-pixel lines is successively embedded into two sets of 3-step phase-shifting patterns to mark fringe periods, which can thoroughly avoid extra ACFs to disrupt the coherence of adjacent morphological information. Secondly, two line-coded phase-shifting patterns with the same phase shift are used to recognize the corresponding coded lines containing the fringe order cue, which can be simultaneously used to guide fringe mutual compensation, thereby extracting a high-quality phase. Finally, according to the pixel positions and the fringe indices of the decoded lines, a multi-layer decoding (MLD) algorithm is developed to iteratively generate a fringe order map, which can adapt to the randomness of morphological changes. Compared to other methods, the proposed LCAPU can not only perform a one-shot 3D shape reconstruction with a single image acquisition, but also automatically correct phase errors, balancing ultra-efficiency and high accuracy. Experimental results demonstrate the superior performance and the practical application potential in dynamic complex scenes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112366"},"PeriodicalIF":7.6,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144989253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

discDC: Unsupervised Discriminative Deep Image Clustering via Confidence-Driven Self-Labeling discDC：基于自信驱动自标记的无监督判别深度图像聚类

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-08-31 DOI: 10.1016/j.patcog.2025.112382

Jinyu Cai , Wenzhong Guo , Yunhe Zhang , Jicong Fan

{"title":"discDC: Unsupervised Discriminative Deep Image Clustering via Confidence-Driven Self-Labeling","authors":"Jinyu Cai , Wenzhong Guo , Yunhe Zhang , Jicong Fan","doi":"10.1016/j.patcog.2025.112382","DOIUrl":"10.1016/j.patcog.2025.112382","url":null,"abstract":"<div><div>Deep clustering, as an important research topic in machine learning and data mining, has been widely applied in many real-world scenarios. However, existing deep clustering methods primarily rely on implicit optimization objectives such as contrastive learning or reconstruction, which do not explicitly enforce cluster-level discrimination. This limitation restricts their ability to achieve compact intra-cluster structures and distinct inter-cluster separations. To overcome this limitation, we propose a novel unsupervised discriminative deep clustering (discDC) method, which explicitly integrates cluster-level discrimination into the learning process. The proposed discDC framework projects data into a nonlinear latent space with compact and well-separated cluster representations. It explicitly optimizes clustering objectives by minimizing intra-cluster discrepancy and maximizing inter-cluster discrepancy. Additionally, to tackle the lack of label information in unsupervised scenarios, we introduce a confidence-driven self-labeling mechanism, which iteratively derives reliable pseudo-labels to enhance discriminative analysis. Extensive experiments on five benchmark datasets demonstrate the superiority of discDC over state-of-the-art deep clustering approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112382"},"PeriodicalIF":7.6,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distantly supervised reinforcement localization for real-world object distribution estimation 现实世界目标分布估计的远程监督强化定位

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-08-31 DOI: 10.1016/j.patcog.2025.112385

Haojie Guo , Junyu Gao , Yuan Yuan

{"title":"Distantly supervised reinforcement localization for real-world object distribution estimation","authors":"Haojie Guo , Junyu Gao , Yuan Yuan","doi":"10.1016/j.patcog.2025.112385","DOIUrl":"10.1016/j.patcog.2025.112385","url":null,"abstract":"<div><div>Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: <span><span>https://github.com/HotChieh/DDRL.</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112385"},"PeriodicalIF":7.6,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A query-driven twin network framework with optimization-based meta-learning for few-shot hyperspectral image classification 基于优化元学习的查询驱动双网络框架用于少量高光谱图像分类

IF 7.6 1区计算机科学

Pattern Recognition Pub Date : 2025-08-31 DOI: 10.1016/j.patcog.2025.112331

Jian Zhu , Pengxin Wang , Jian Hui , Xin Ye

{"title":"A query-driven twin network framework with optimization-based meta-learning for few-shot hyperspectral image classification","authors":"Jian Zhu , Pengxin Wang , Jian Hui , Xin Ye","doi":"10.1016/j.patcog.2025.112331","DOIUrl":"10.1016/j.patcog.2025.112331","url":null,"abstract":"<div><div>Deep learning has achieved remarkable results in hyperspectral image (HSI) classification due to its powerful deep feature extraction and nonlinear relationship processing capabilities. However, the success of deep learning methods is largely dependent on extensive labeled samples, which is both time-consuming and labor-intensive. To address this issue, a novel query-driven meta-learning twin network (QMTN) framework is proposed for HSI few-shot learning. QMTN uses two meta-learning channels, allowing for the comprehensive learning of meta-knowledge across diverse meta-tasks and enhancing learning efficiency. Within the QMTN framework, a lightweight spectral-spatial attention residual network is proposed for extraction of HSI features. The network incorporates a residual mechanism in both spectral and spatial feature extraction processes and includes an attention block to improve network performance by focusing on key locations in the spatial features. To maximize the use of the limited samples for constructing diverse meta-tasks, two meta-task generation approaches are employed, with and without simulated noise. Experiments on three public HSI datasets demonstrate that the QMTN framework effectively reduces the dependence on labeled samples in a single scene and significantly improves the classification performance and convergence of the internal network. The meta-task generation method with simulated noise can improve the classification performance of the QMTN.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112331"},"PeriodicalIF":7.6,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144989250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0