Gege Zhang , Zhiyong Gan , Ling Deng , Shuaicheng Niu , Zhenghua Peng , Gang Dai , Shuangping Huang , Xiangmin Xu
{"title":"A text-only weakly supervised learning framework for text spotting via text-to-polygon generator","authors":"Gege Zhang , Zhiyong Gan , Ling Deng , Shuaicheng Niu , Zhenghua Peng , Gang Dai , Shuangping Huang , Xiangmin Xu","doi":"10.1016/j.patcog.2025.112408","DOIUrl":"10.1016/j.patcog.2025.112408","url":null,"abstract":"<div><div>Advanced text spotting methods typically rely on large-scale, meticulously labeled datasets to achieve satisfactory performance. However, annotating fine-grained positional information of texts in real-world scene images is extremely costly and time-consuming. Although some weakly supervised methods have been developed to reduce annotation costs, they face two major challenges: 1) their performance significantly lags behind the fully supervised counterparts, and 2) They are tightly coupled with specific text spotting models, meaning that switching to a different model would require retraining and incur substantial computational costs. To address these limitations, we propose a novel text-only weakly supervised learning framework for text spotting via text-to-polygon generator. In the first stage, we pretrain a text-to-polygon generator on an auxiliary dataset, e.g., synthetic or public datasets, where full annotations are readily accessible. In the second stage, given real-world target datasets annotated with text-only labels, we employ the pretrained generator to produce pseudo polygon labels, thereby constructing a pseudo-labeled supervised dataset for training text spotting models. To ensure high-quality pseudo polygon labels, the text-to-polygon generator first identifies all candidate text regions, then filters those that are relevant to the target text, and finally predicts their precise spatial locations. Notably, this generator requires only a single pretraining session and can subsequently be applied to any text spotting model and target text-only dataset without incurring additional costs. Extensive experiments on public benchmarks demonstrate that our method can significantly reduce labeling costs while maintaining competitive performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112408"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qianyao Qiang , Bin Zhang , Jason Chen Zhang , Feiping Nie
{"title":"Fast multi-view discrete clustering with two solvers","authors":"Qianyao Qiang , Bin Zhang , Jason Chen Zhang , Feiping Nie","doi":"10.1016/j.patcog.2025.112415","DOIUrl":"10.1016/j.patcog.2025.112415","url":null,"abstract":"<div><div>Multi-view graph clustering follows a three-phase process: constructing view-specific similarity graphs, fusing information from different views, and conducting eigenvalue decomposition followed by post-processing to obtain the clustering indicators. However, it encounters two key challenges: the high computational cost of graph construction and eigenvalue decomposition, and the inevitable information deviation introduced by the last process. To tackle these obstacles, we propose Fast Multi-view Discrete Clustering with two solvers (FMDC), to directly and efficiently solve the multi-view graph clustering problem. FMDC involves: (1) generating a compact set of representative anchors to construct anchor graphs, (2) automatically weighting them into a symmetric and doubly stochastic aggregated similarity matrix, (3) executing clustering on the aggregated form with the discrete indicator matrix directly computed through two efficient solvers that we devised. The linear computational complexity of FMDC w.r.t. data size is a notable improvement over traditional quadratic or cubic complexity. Extensive experiments confirm the superior performance of FMDC both in efficiency and in effectiveness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112415"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145020171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge tailoring: Bridging the teacher-student gap in semantic segmentation","authors":"Seokhwa Cheung , Seungbeom Woo , Taehoon Kim , Wonjun Hwang","doi":"10.1016/j.patcog.2025.112399","DOIUrl":"10.1016/j.patcog.2025.112399","url":null,"abstract":"<div><div>Knowledge distillation transfers knowledge from a high-capacity teacher network to a compact student network, but a large capacity gap often limits the student’s ability to fully benefit from the teacher’s guidance. In semantic segmentation, another major challenge is the difficulty in predicting accurate object boundaries, as even strong teacher models can produce ambiguous or imprecise outputs. To address both challenges, we present Knowledge Tailoring, a novel distillation framework that adapts the teacher’s knowledge to better match the student’s representational capacity and learning dynamics. Much like a tailor adjusts an oversized suit to fit the wearer’s shape, our method reshapes the teacher’s abundant but misaligned knowledge into a form more suitable for the student. KT introduces feature tailoring, which restructures intermediate features based on channel-wise correlation to narrow the representation gap, and logit tailoring, which improves boundary prediction by refining class-specific logits. The tailoring strategy evolves throughout training, offering guidance that aligns with the student’s progress. Experiments on Cityscapes, Pascal VOC, and ADE20K confirm that KT consistently enhances performance across a variety of architectures including DeepLabV3, PSPNet, and SegFormer. Our code is available for <span><span>https://github.com/seok-hwa/KT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112399"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shenglin Peng , Xingguo Zhao , Jun Wang , Lin Wang , Shuyi Qu , Jingye Peng , Xianlin Peng
{"title":"DURN: Data uncertainty-driven robust network for mural sketch detection","authors":"Shenglin Peng , Xingguo Zhao , Jun Wang , Lin Wang , Shuyi Qu , Jingye Peng , Xianlin Peng","doi":"10.1016/j.patcog.2025.112404","DOIUrl":"10.1016/j.patcog.2025.112404","url":null,"abstract":"<div><div>Mural sketches reveal both the content and structure of the murals and are crucial for the preservation of murals. However, existing methods lack robustness, making it difficult to suppress noise while preserving sketches on damaged murals and fully capturing details on clear murals. To address this, we propose a Data Uncertainty-Driven Robust Network (DURN) for mural sketch detection. DURN uses uncertainty to quantify noise in the murals, converting prediction into a learnable normal distribution, where the mean represents the sketch and the variance denotes the uncertainty. This enables the model to learn both the sketch and the noise simultaneously, achieving noise suppression while preserving the sketches. To enhance sketches, we design an Adaptive Fusion Feature Enhancement Module (AFFE) to dynamically adjust the fusion strategy according to the contribution of features at different scales and reduce the information loss caused by feature dimensionality reduction to maximize the utility of each feature. We develop a novel Deep-Shallow Supervision (DSS) module to mitigate background noise using deep semantic information to guide shallow features without adding parameters. Additionally, we achieve model lightweighting through pruning techniques, ensuring competitive performance while reducing the number of parameters to only 4.5 % of the original. The experimental results show an improvement of 10. 4 % AP over existing methods, demonstrating the robustness of DURN for complex and damaged murals. The source code is available at <span><span>https://github.com/TIVEN-Z/DURN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112404"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145009880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IRIS: An information path planning method based on reinforcement learning and information-directed sampling","authors":"Ziyuan Liu , Yan Zhuang , Peng Wu , Yuanchang Liu","doi":"10.1016/j.patcog.2025.112400","DOIUrl":"10.1016/j.patcog.2025.112400","url":null,"abstract":"<div><div>Information Path Planning (IPP) is a critical aspect of robotics, aimed at intelligently selecting information-rich paths to optimize robot trajectories and significantly enhance the efficiency and quality of data collection. However, in the process of maximizing information acquisition, IPP must also account for energy consumption, time constraints, and physical obstacles, which often lead to inefficiencies. To address these challenges, we propose an Information Path Planning method based on Reinforcement Learning and Information-Directed Sampling (IRIS). This model is the first to integrate Reinforcement Learning (RL) with Information-Directed Sampling (IDS), ensuring both immediate rewards and the potential for greater information gain through exploratory actions. IRIS employs an off-policy deep reinforcement learning framework, effectively overcoming the limitations observed in on-policy methods, thereby enhancing the model’s adaptability and efficiency. Simulation results demonstrate that the IRIS algorithm performs exceptionally well across various IPP scenarios. Once training stabilizes, IDS will dominate decision-making with a probability of approximately 1.3 % to yield better outcomes, highlighting its significant potential in this field. The relevant code is available at <span><span>https://github.com/SUTLZY/IRIS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112400"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riran Cheng , Xupeng Wang , Ferdous Sohel , Hang Lei
{"title":"Joint Adversarial Attack: An Effective Approach to Evaluate Robustness of 3D Object Tracking","authors":"Riran Cheng , Xupeng Wang , Ferdous Sohel , Hang Lei","doi":"10.1016/j.patcog.2025.112359","DOIUrl":"10.1016/j.patcog.2025.112359","url":null,"abstract":"<div><div>Deep neural networks (DNNs) have widely been used in 3D object tracking, thanks to its superior capabilities to learn from geometric training samples and locate tracking targets. Although the DNN based trackers show vulnerability to adversarial examples, their robustness in real-world scenarios with potentially complex data defects has rarely been studied. To this end, a joint adversarial attack method against 3D object tracking is proposed, which simulates defects of the point cloud data in the form of point filtration and perturbation simultaneously. Specifically, a voxel-based point filtration module is designed to filter points of the tracking template, which is described by the voxel-wise binary distribution regarding the density of the point cloud. Furthermore, a voxel-based point perturbation module adds voxel-wise perturbations to the filtered template, whose direction is constrained by local geometrical information of the template. Experiments conducted on popular 3D trackers demonstrate that the proposed joint attack have decreased the success and precision of existing 3D trackers by 30.2% and 35.4% respectively in average, which made an improvement of 30.5% over existing attack methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112359"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paolo Andreini , Marco Tanfoni , Simone Bonechi , Monica Bianchini
{"title":"Leveraging synthetic data for zero–shot and few–shot circle detection in real–world domains","authors":"Paolo Andreini , Marco Tanfoni , Simone Bonechi , Monica Bianchini","doi":"10.1016/j.patcog.2025.112407","DOIUrl":"10.1016/j.patcog.2025.112407","url":null,"abstract":"<div><div>Circle detection plays a pivotal role in computer vision, underpinning applications from industrial inspection and bioinformatics to autonomous driving. Traditional methods, however, often struggle with real–world complexities, as they demand extensive parameter tuning and adaptation across different domains. In this paper, we present the Synthetic Circle Dataset (SynCircle), a large synthetic image dataset designed to train a YOLO v10 network for circle detection. The YOLO v10 network, pre–trained solely on synthetic data, demonstrates remarkable off–the–shelf performance that surpasses conventional methods in various practical scenarios. Furthermore, we show that incorporating just a few labeled real images for fine–tuning can significantly boost performance, reducing the need for large annotated datasets. To promote reproducibility and streamline adoption, we publicly release both the trained YOLO v10 weights and the full SynCircle dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112407"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145020176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiayang Shi , Shangfeng Chen , Gang Zhang , Wei Wei , Yinlin Li , Zhaoxin Fan , Jingjing Liu
{"title":"Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models","authors":"Xiayang Shi , Shangfeng Chen , Gang Zhang , Wei Wei , Yinlin Li , Zhaoxin Fan , Jingjing Liu","doi":"10.1016/j.patcog.2025.112391","DOIUrl":"10.1016/j.patcog.2025.112391","url":null,"abstract":"<div><div>Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112391"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145020177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised instance segmentation with superpixels","authors":"Cuong Manh Hoang","doi":"10.1016/j.patcog.2025.112402","DOIUrl":"10.1016/j.patcog.2025.112402","url":null,"abstract":"<div><div>Instance segmentation is essential for numerous computer vision applications, including robotics, human-computer interaction, and autonomous driving. Currently, popular models bring impressive performance in instance segmentation by training with a large number of human annotations, which are costly to collect. For this reason, we present a new framework that efficiently and effectively segments objects without the need for human annotations. Firstly, a MultiCut algorithm is applied to self-supervised features for coarse mask segmentation. Then, a mask filter is employed to obtain high-quality coarse masks. To train the segmentation network, we compute a novel superpixel-guided mask loss, comprising hard loss and soft loss, with high-quality coarse masks and superpixels segmented from low-level image features. Lastly, a self-training process with a new adaptive loss is proposed to improve the quality of predicted masks. We conduct experiments on public datasets in instance segmentation and object detection to demonstrate the effectiveness of the proposed framework. The results show that the proposed framework outperforms previous state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112402"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianguo Ju , Menghao Liu , Wenhuan Song , Tongtong Zhang , Jindong Liu , Pengfei Xu , Ziyu Guan
{"title":"A boundary-enhanced and target-driven deformable convolutional network for abdominal multi-organ segmentation","authors":"Jianguo Ju , Menghao Liu , Wenhuan Song , Tongtong Zhang , Jindong Liu , Pengfei Xu , Ziyu Guan","doi":"10.1016/j.patcog.2025.112386","DOIUrl":"10.1016/j.patcog.2025.112386","url":null,"abstract":"<div><div>It is crucial to accurately segment organs from abdominal CT images for clinical diagnosis, treatment planning, and surgical guidance, which remains an extremely challenging task due to low contrast between organs and surrounding tissues and the difference of organ size and shape. Previous works mainly focused on complex network architectures or task-specific modules but frequently failed to learn irregular boundaries and did not consider that different slices from the same case might contain targets of different numbers of categories. To tackle these issues, this paper proposes UAMSNet for abdominal multi-organ segmentation. In UAMSNet, a hybrid receptive field extraction (HRFE) module is introduced to adaptively learn the features of irregular targets, which has an adaptive dilation factor containing distance information to facilitate spatial and channel attention. The HRFE module can simultaneously learn multiple scales and deformations of different organs. Furthermore, a multi-organ boundary-enhanced attention (MBA) module in the encoder and decoder is designed to provide effective boundary information for feature extraction based on the large peak of the organ edge. Finally, the difference in the number of organ categories between different slices is first considered using a loss function, which can adjust the loss computation based on organ categories in the image. The loss function mitigates the effect of false positives during training to ensure the model can adapt to small organ segmentation. Experimental results on WORD and Synapse datasets demonstrate that our UAMSNet outperforms the existing state-of-the-art methods. Ablation experiments confirm the effectiveness of our designed modules and loss function. Our code is publicly available on <span><span>https://github.com/HeyJGJu/UAMSNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112386"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}