{"title":"Granular Ball K-Class Twin Support Vector Classifier","authors":"M.A. Ganaie , Vrushank Ahire , Anouck Girard","doi":"10.1016/j.patcog.2025.111636","DOIUrl":"10.1016/j.patcog.2025.111636","url":null,"abstract":"<div><div>This paper introduces the Granular Ball K-Class Twin Support Vector Classifier (GB-TWKSVC), a novel multi-class classification framework that combines Twin Support Vector Machines (TWSVM) with granular ball computing. The proposed method addresses key challenges in multi-class classification by utilizing granular ball representation for improved noise robustness and TWSVM’s non-parallel hyperplane architecture solves two smaller quadratic programming problems, enhancing efficiency. Our approach introduces a novel formulation that effectively handles multi-class scenarios, advancing traditional binary classification methods. Experimental evaluation on nine UCI benchmark datasets demonstrates that GB-TWKSVC significantly outperforms state-of-the-art classifiers in both accuracy and computational performance, achieving up to 5% higher accuracy and 50% faster computation than Twin-KSVC and 1-versus-rest TSVM. Notably, it attains 99.34% accuracy on Iris and 91.04% on Ecoli, surpassing competing methods. The method’s effectiveness is validated through comprehensive statistical tests and complexity analysis, establishing a mathematically sound framework. The results highlight GB-TWKSVC’s potential in pattern recognition, fault diagnosis and large-scale data analytics utilizing its ability to capture fine-grained features in high-dimensional data making it a valuable advancement in classification algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111636"},"PeriodicalIF":7.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhibo Yang , Wei Hua , Sibo Song , Cong Yao , Yingying Zhu , Wenqing Cheng , Xiang Bai
{"title":"Generative compositor for few-shot visual information extraction","authors":"Zhibo Yang , Wei Hua , Sibo Song , Cong Yao , Yingying Zhu , Wenqing Cheng , Xiang Bai","doi":"10.1016/j.patcog.2025.111624","DOIUrl":"10.1016/j.patcog.2025.111624","url":null,"abstract":"<div><div>Visual Information Extraction (VIE), aiming at extracting structured information from visually rich document images, plays a pivotal role in document processing. Considering various layouts, semantic scopes, and languages, VIE encompasses an extensive range of types, potentially numbering in the thousands. However, many of these types suffer from a lack of training data, which poses significant challenges. In this paper, we propose a novel generative model, named Generative Compositor, to address the challenge of few-shot VIE. The Generative Compositor is a hybrid pointer-generator network that emulates the operations of a compositor by retrieving words from the source text and assembling them based on the provided prompts. Furthermore, three pre-training strategies are employed to enhance the model’s perception of spatial context information. Besides, a prompt-aware resampler is specially designed to enable efficient matching by leveraging the entity-semantic prior contained in prompts. The introduction of the prompt-based retrieval mechanism and the pre-training strategies enable the model to acquire more effective spatial and semantic clues with limited training samples. Experiments demonstrate that the proposed method achieves highly competitive results in the full-sample training, while notably outperforms the baseline in the 1-shot, 5-shot, and 10-shot settings.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111624"},"PeriodicalIF":7.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared small target detection based on hypergraph and asymmetric penalty function","authors":"Yuan Luo, Xiaorun Li, Shuhan Chen","doi":"10.1016/j.patcog.2025.111634","DOIUrl":"10.1016/j.patcog.2025.111634","url":null,"abstract":"<div><div>Recently, infrared (IR) small target detection problem has attracted increasing attention. Component analysis-based techniques have been widely utilized, while they are faced with challenges such as low-rank background and sparse target estimation, and model construction. In this paper, an IR small target detection model with hypergraph Laplacian regularization and asymmetric penalty function-based regularization (HGLAPR) is proposed. Specifically, a spatial–temporal tensor is constructed. Then, we construct a hypergraph structure and design a hypergraph Laplacian regularization as well as a Laplace-based tensor nuclear norm for low-rank background estimation. Additionally, an asymmetric penalty function-based sparsity regularization is introduced for more accurate target estimation. To efficiently solve this model, we design an alternating direction method of multipliers (ADMM)-based optimization scheme. Extensive experiments conducted on six real IR sequences with complex scenarios illustrate the superiority of HGLAPR over ten state-of-the-art competitive methods in terms of target detectability, background suppressibility and overall performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111634"},"PeriodicalIF":7.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CSA: Cross-scale alignment with adaptive semantic aggregation and filter for image–text retrieval","authors":"Zheng Liu , Junhao Xu , Shanshan Gao , Zhumin Chen","doi":"10.1016/j.patcog.2025.111647","DOIUrl":"10.1016/j.patcog.2025.111647","url":null,"abstract":"<div><div>Due to the inconsistency in feature representations between different modalities, known as the “Heterogeneous gap”, image–text retrieval (ITR) is a challenging task. To bridge this gap, establishing semantic associations between visual and textual parts of images and texts has been proven to be an effective strategy for the ITR task. However, existing ITR methods focus on establishing fixed-scale semantic associations by aligning visual and textual parts at fixed scales, namely, fixed-scale alignment (FSA). To overcome the limitations of FSA, cross-scale semantic associations, which exist between visual and textual parts at unfixed scales, should be sufficiently captured. Therefore, to achieve the objective of improving the performance of current image–text retrieval systems by introducing cross-scale alignment without scale constraints, we propose a novel cross-scale alignment (CSA) framework to strengthen connections between images and texts via thoroughly exploring cross-scale semantic associations. Firstly, to construct scale-adaptable semantic units, an adaptive semantic aggregation algorithm is developed, which generates both position-aware and co-occurrence-aware subsequences, and then adaptively merges them according to IoU values. Secondly, to filter out weak semantic associations in both the scale-balanced and scale-unbalanced alignment tasks, an adaptive semantic filter algorithm is presented, which learns two types of mask matrices by adaptively determining boundaries in probability density distributions. Thirdly, to learn accurate image–text similarity, a semantic unit alignment strategy is proposed to freely align visual and textual semantic units across various unfixed scales. Extensive experiments demonstrate the superiority of CSA over state-of-the-art ITR methods. Code available at: <span><span>https://github.com/xjh0805/CSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111647"},"PeriodicalIF":7.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yantao Lu , Ning Liu , Yilan Li , Jinchao Chen , Senem Velipasalar
{"title":"Cross-task and time-aware adversarial attack framework for perception of autonomous driving","authors":"Yantao Lu , Ning Liu , Yilan Li , Jinchao Chen , Senem Velipasalar","doi":"10.1016/j.patcog.2025.111652","DOIUrl":"10.1016/j.patcog.2025.111652","url":null,"abstract":"<div><div>Despite the rapid advances in adversarial machine learning, state-of-the-art attack methods encounter practical limitations in the field of onboard perception that require real-time and multi-task processing. Conventional attacks typically target a specific perception task, such as object detection or segmentation, making it difficult to penetrate an entire multi-task perception module simultaneously. Although several cross-task transferable attacks have been proposed, these studies predominantly rely on model ensembling or iterative searching, both of which are often time-intensive and fail to meet the real-time processing requirements of autonomous driving platforms. To address these limitations, we propose Perception Streaming Attack (PSA), which is a non-iterative cross-task adversarial attack framework. We firstly propose Priori Perturbation Generator (PPG) to calculate a priori perturbation by leveraging the perturbation of previous frame as well as the motion information between the previous and current frames. Then, we propose Posterior Perturbation Updater (PPU) to refine the priori perturbation and obtain the final adversarial example for current frame. Comprehensive experimental evaluations on BDD100k and NuImages datasets demonstrate that the proposed PSA, compared with the state-of-the-art attacks, can effectively and efficiently attack across different tasks used in onboard perception. We also deploy our Perception Streaming Attack framework on a single-board computer (NVIDIA Jetson AGX Xavier) to validate the on-board performance. The experimental results show that the proposed PSA can successfully run at 12 Hz and effectively erase at least 76% objects that should be sensed.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111652"},"PeriodicalIF":7.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Napsu Karmitsa , Ville-Pekka Eronen , Marko M. Mäkelä , Tapio Pahikkala , Antti Airola
{"title":"Stochastic limited memory bundle algorithm for clustering in big data","authors":"Napsu Karmitsa , Ville-Pekka Eronen , Marko M. Mäkelä , Tapio Pahikkala , Antti Airola","doi":"10.1016/j.patcog.2025.111654","DOIUrl":"10.1016/j.patcog.2025.111654","url":null,"abstract":"<div><div>Clustering is a crucial task in data mining and machine learning. In this paper, we propose an efficient algorithm, <span>Big-Clust</span>, for solving minimum sum-of-squares clustering problems in large and big datasets. We first develop a novel stochastic limited memory bundle algorithm (<span>SLMBA</span>) for large-scale nonsmooth finite-sum optimization problems and then formulate the clustering problem accordingly. The <span>Big-Clust</span>algorithm — a stochastic adaptation of the incremental clustering methodology — aims to find the global or a high-quality local solution for the clustering problem. It detects good starting points, i.e., initial cluster centers, for the <span>SLMBA</span>, applied as an underlying solver. We evaluate <span>Big-Clust</span>on several real-world datasets with numerous data points and features, comparing its performance with other clustering algorithms designed for large and big data. Numerical results demonstrate the efficiency of the proposed algorithm and the high quality of the found solutions on par with the best existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111654"},"PeriodicalIF":7.5,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boundary-aware and cross-modal fusion network for enhanced multi-modal brain tumor segmentation","authors":"Tongxue Zhou","doi":"10.1016/j.patcog.2025.111637","DOIUrl":"10.1016/j.patcog.2025.111637","url":null,"abstract":"<div><div>In recent years, brain tumor segmentation has emerged as a critical area of focus in medical image analysis. Accurate tumor delineation is essential for effective treatment planning and patient monitoring. Many existing algorithms struggle with accurately delineating complex tumor boundaries, particularly in cases where tumors exhibit heterogeneous features or blend with surrounding healthy tissues. In this paper, I propose a novel boundary-aware multi-modal brain tumor segmentation network, which integrates four key contributions to improve segmentation accuracy. First, I introduce a Boundary Extraction Module (BEM) to capture essential boundary information for segmentation. Second, I present a Boundary Guidance Module (BGM) to guide the segmentation process by incorporating boundary-specific information. Third, I design a Boundary Supervision Module (BSM) to enhance segmentation accuracy by providing multi-level boundary supervision. Lastly, I propose a Cross-feature Fusion (CFF) that integrates complementary information from different MRI modalities to enhance overall segmentation performance. Experimental results demonstrate that the proposed model outperforms state-of-the-art methods, achieving superior tumor segmentation accuracy across brain tumor segmentation datasets, thereby indicating its potential for clinical applications in neuroimaging.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111637"},"PeriodicalIF":7.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143786240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuang Wang , Siyeop Yoon , Pengfei Jin , Matthew Tivnan , Sifan Song , Zhennong Chen , Rui Hu , Li Zhang , Quanzheng Li , Zhiqiang Chen , Dufan Wu
{"title":"Implicit Image-to-Image Schrödinger Bridge for image restoration","authors":"Yuang Wang , Siyeop Yoon , Pengfei Jin , Matthew Tivnan , Sifan Song , Zhennong Chen , Rui Hu , Li Zhang , Quanzheng Li , Zhiqiang Chen , Dufan Wu","doi":"10.1016/j.patcog.2025.111627","DOIUrl":"10.1016/j.patcog.2025.111627","url":null,"abstract":"<div><div>Diffusion-based models have demonstrated remarkable effectiveness in image restoration tasks; however, their iterative denoising process, which starts from Gaussian noise, often leads to slow inference speeds. The Image-to-Image Schrödinger Bridge (I<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SB) offers a promising alternative by initializing the generative process from corrupted images while leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schrödinger Bridge (I<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>SB) to further accelerate the generative process of I<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SB. I<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>SB restructures the generative process into a non-Markovian framework by incorporating the initial corrupted image at each generative step, effectively preserving and utilizing its information. To enable direct use of pretrained I<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SB models without additional training, we ensure consistency in marginal distributions. Extensive experiments across many image corruptions—including noise, low resolution, JPEG compression, and sparse sampling—and multiple image modalities—such as natural, human face, and medical images— demonstrate the acceleration benefits of I<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>SB. Compared to I<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SB, I<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>SB achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111627"},"PeriodicalIF":7.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueli Liu , Wentao Gong , Xiao Chen , Zhen Li , Yinlong Liu , Li Wang , Quan Liu , Xicai Sun , Xiaofeng Liu , Xinrong Chen , Yuxuan Shi , Hongmeng Yu
{"title":"Vision-language foundation model for generalizable nasal disease diagnosis using unlabeled endoscopic records","authors":"Xueli Liu , Wentao Gong , Xiao Chen , Zhen Li , Yinlong Liu , Li Wang , Quan Liu , Xicai Sun , Xiaofeng Liu , Xinrong Chen , Yuxuan Shi , Hongmeng Yu","doi":"10.1016/j.patcog.2025.111646","DOIUrl":"10.1016/j.patcog.2025.111646","url":null,"abstract":"<div><div>Medical artificial intelligence (AI) holds significant potential in identifying signs of health conditions in nasal endoscopic images, thereby accelerating the diagnosis of diseases and systemic disorders. However, the performance of AI models heavily relies on expert annotations, and these models are usually task-specific with limited generalization performance across various clinical applications. In this paper, we introduce NasVLM, a Nasal Vision-Language foundation Model designed to extract universal representations from unlabeled nasal endoscopic data. Additionally, we construct a large-scale nasal endoscopic pre-training dataset and three downstream validation datasets from routine diagnostic records. The core strength of NasVLM lies in its ability to learn cross-modal semantic representations and perform multi-granular report-image alignment without depending on expert annotations. Furthermore, to the best of our knowledge, it is the first medical foundation model that effectively aligns medical report with multiple images of different anatomic regions, facilitated by a well-designed hierarchical report-supervised learning framework. The experimental results demonstrate that NasVLM has superior generalization performance across diverse diagnostic tasks and surpasses state-of-the-art self- and report-supervised methods in disease classification and lesion localization, especially in scenarios requiring label-efficient fine-tuning. For instance, NasVLM can distinguish normal nasopharynx (NOR) from abnormalities (benign hyperplasia, BH, and nasopharyngeal carcinoma, NPC) with an accuracy of 91.38% (95% CI, 90.59 to 92.17) and differentiate NPC from BH and NOR with an accuracy of 81.45% (95% CI, 80.21 to 82.67) on the multi-center NPC-Screen dataset using only 1% labeled data, on par with the performance of traditional supervised methods using 100% labeled data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111646"},"PeriodicalIF":7.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-driven recognition of uncertainty by integrating matrix factorization and kernel smoothing methods","authors":"Ying Kou , Zhong Wan , Dandan Zhao","doi":"10.1016/j.patcog.2025.111650","DOIUrl":"10.1016/j.patcog.2025.111650","url":null,"abstract":"<div><div>Recognizing uncertainty from complex data sets plays a fundamental role in solving many decision-making problems arising from learning systems and cognitive sciences, especially by data-driven robust optimization approaches. In this paper, a novel recognition method is proposed to characterize data distribution with complicated features by fusing different learning techniques, so as to construct a disjunctive data-driven uncertainty set by concurrent identification of clustering features and distribution information underlying in data. Boundary constraints are employed to further tighten the uncertainty set by removing the empty regions. Based on such an uncertainty set, a data-driven static robust optimization framework is proposed, and its computationally tractable robust counterpart is presented. A column-and-constraint generation based algorithm is also developed for solving the uncertainty set-induced data-driven two-stage robust optimization model. Efficiency and superiority of the proposed robust methods in this paper are illustrated by numerical tests involving the solution of a linear uncertain problem and a pre-inventory and reallocation problem under emergencies.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111650"},"PeriodicalIF":7.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}