Ruiguo Yu , Yiyang Zhang , Yuan Tian , Yujie Diao , Di Jin , Witold Pedrycz
{"title":"MAMBO-NET: Multi-causal aware modeling backdoor-intervention optimization for medical image segmentation network","authors":"Ruiguo Yu , Yiyang Zhang , Yuan Tian , Yujie Diao , Di Jin , Witold Pedrycz","doi":"10.1016/j.patrec.2025.06.016","DOIUrl":"10.1016/j.patrec.2025.06.016","url":null,"abstract":"<div><div>Medical image segmentation methods generally assume that the process from medical image to segmentation is unbiased, and use neural networks to establish conditional probability models to complete the segmentation task. This assumption does not consider confusion factors, which can affect medical images, such as complex anatomical variations and imaging modality limitations. Confusion factors obfuscate the relevance and causality of medical image segmentation, leading to unsatisfactory segmentation results. To address this issue, we propose a multi-causal aware modeling backdoor-intervention optimization (MAMBO-NET) network for medical image segmentation. Drawing insights from causal inference, MAMBO-NET utilizes self-modeling with multi-Gaussian distributions to fit the confusion factors and introduce causal intervention into the segmentation process. Moreover, we design appropriate posterior probability constraints to effectively train the distributions of confusion factors. For the distributions to effectively guide the segmentation and mitigate and eliminate the impact of confusion factors on the segmentation, we introduce classical backdoor intervention techniques and analyze their feasibility in the segmentation task. Experiments on five medical image datasets demonstrate a maximum improvement of 2.28% in Dice score on three ultrasound datasets, with false discovery rate reduced by 1.49% and 1.87% for dermatoscopy and colonoscopy datasets respectively, indicating broad applicability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 102-109"},"PeriodicalIF":3.3,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust multimodal face anti-spoofing via frequency-domain feature refinement and aggregation","authors":"Rui Sun, Fei Wang, Xiaolu Yu, Xinjian Gao, Xudong Zhang","doi":"10.1016/j.patrec.2025.07.003","DOIUrl":"10.1016/j.patrec.2025.07.003","url":null,"abstract":"<div><div>The existing face anti-spoofing (FAS) methods face two main problems in practical applications: (1) single visible light modality may fail in low-light conditions; (2) insufficient consideration of noise interference. These problems limit the potential application of FAS models in real-world scenarios. To enhance the model’s robustness against environmental changes, we propose a multimodal FAS method that incorporates the frequency domain feature refinement and multi-stage aggregation. Specifically, during the feature extraction process, we utilize wavelet transform to selectively refine and reorganize high and low-frequency features. Additionally, we designed an RGB modality-guided feature interaction fusion module, where the fused features at different stages progressively improve the final discriminative features. The final results indicate that our method achieves excellent performance across multiple public datasets. Furthermore, we conducted experiments by randomly adding noise to the WMCA and CASIA-SURF datasets, and the results demonstrate that our method effectively leverages frequency information to maintain robustness against noise interference, also performing exceptionally well when handling low-quality images.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 31-36"},"PeriodicalIF":3.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144662805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Domain Few-Shot 3D Point Cloud Semantic Segmentation","authors":"Jiwei Xiao, Ruiping Wang, Chen He, Xilin Chen","doi":"10.1016/j.patrec.2025.07.001","DOIUrl":"10.1016/j.patrec.2025.07.001","url":null,"abstract":"<div><div>Training fully supervised 3D point cloud semantic segmentation models is hindered by the need for extensive datasets and expensive annotation, limiting rapid expansion to additional categories. In response to these challenges, Few-Shot 3D Point Cloud Semantic Segmentation (3D FS-SSeg) methods utilize less labeled scene data to generalize to new categories. However, these approaches still depend on laboriously annotated semantic labels in 3D scenes. To address this limitation, we propose a more practical task named Cross-Domain Few-Shot 3D Point Cloud Semantic Segmentation (3D CD-FS-SSeg). In this task, we expand the model’s ability to segment point clouds of novel classes in unknown scenes by leveraging a small amount of low-cost CAD object model data or RGB-D image data as a support set. To accomplish the above task, we propose an approach that consists of two main blocks: a Cross Domain Adaptation (CDA) module that transfers the contextual information of the query scene to the support object to reduce the cross-domain gap, and a Multiple Prototypes Discriminative (MPD) loss that enhances inter-class variation while reducing intra-class variation. Experimental results on the ScanNet and S3DIS datasets demonstrate that our proposed method provides a significant improvement on the 3D CD-FS-SSeg benchmark.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 51-57"},"PeriodicalIF":3.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xue Xian Zheng , Bilal Taha , Muhammad Mahboob Ur Rahman , Mudassir Masood , Dimitrios Hatzinakos , Tareq Al-Naffouri
{"title":"Multimodal biometric authentication using camera-based PPG and fingerprint fusion","authors":"Xue Xian Zheng , Bilal Taha , Muhammad Mahboob Ur Rahman , Mudassir Masood , Dimitrios Hatzinakos , Tareq Al-Naffouri","doi":"10.1016/j.patrec.2025.06.017","DOIUrl":"10.1016/j.patrec.2025.06.017","url":null,"abstract":"<div><div>This paper presents a multimodal biometric system fusing photoplethysmography (PPG) signals and fingerprints for robust human verification. Instead of relying on heterogeneous biosensors, the PPG signals and fingerprints are both obtained through video recordings from a smartphone’s camera, as users place their fingers on the lens. To capture the unique characteristics of each user, we propose a homogeneous neural network consisting of two structured state space model (SSM) encoders to handle the distinct modalities. Specifically, the fingerprint images are flattened into sequences of pixels, which, along with segmented PPG beat waveforms, are fed into the encoders. This is followed by a cross-modal attention mechanism to learn more nuanced feature representations. Furthermore, their feature distributions are aligned within a unified latent space, utilizing a distribution-oriented contrastive loss. This alignment facilitates the learning of intrinsic and transferable intermodal relationships, thereby improving the system’s performance with unseen data. Experimental results on the datasets collected for this study demonstrate the superiority of the proposed approach, validated across a broad range of evaluation metrics in both single-session and two-session authentication scenarios. The system achieved an accuracy of 100% and an equal error rate (EER) of 0.1% for single-session data, and an accuracy of 94.3% and an EER of 6.9% for two-session data.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bidirectional two-dimensional supervised multiset canonical correlation analysis for multi-view feature extraction","authors":"Jing Yang , Liya Fan , Quansen Sun , Xizhan Gao","doi":"10.1016/j.patrec.2025.06.024","DOIUrl":"10.1016/j.patrec.2025.06.024","url":null,"abstract":"<div><div>Bidirectional two-dimensional multiset canonical correlation analysis ((2D)<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>MCCA) studies the linear correlation between multiple datasets while not requiring vectorization of the image matrix. However, it does not use the class label information in the data during feature extraction. In order to fully utilize the class label information for feature extraction, this letter proposes a new method called bidirectional two-dimensional supervised multiset canonical correlation analysis ((2D)<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SMCCA). The basic idea of (2D)<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SMCCA is to replace the equality constraints with the inequality constraints, maximizing the correlation of multiple sets of data while maximizing the inter-class scatter and minimizing the intra-class scatter for intra-group data. Experiments on face image and object image databases show that the proposed method has good recognition performance, while the extracted features have strong discriminative ability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 37-43"},"PeriodicalIF":3.9,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144670867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhendong Guo , Na Dong , Shuai Liu , Donghui Li , Wai Hung Ip , Kai Leung Yung
{"title":"Application and optimization of lightweight visual SLAM in dynamic industrial environment","authors":"Zhendong Guo , Na Dong , Shuai Liu , Donghui Li , Wai Hung Ip , Kai Leung Yung","doi":"10.1016/j.patrec.2025.06.021","DOIUrl":"10.1016/j.patrec.2025.06.021","url":null,"abstract":"<div><div>With the increasing adoption of visual SLAM in industrial automation, maintaining real-time performance and robustness in dynamic environments presents a significant challenge. Traditional SLAM systems often struggle with interference from moving objects and real-time processing on resource-constrained devices, resulting in accuracy issues. This paper introduces a lightweight object detection algorithm that employs spatial-channel decoupling for efficient removal of dynamic objects. It utilizes Region-Adaptive Deformable Convolution (RAD-Conv) to minimize computational complexity and incorporates a lightweight Convolutional Neural Network(CNN) architecture to enhance real-time performance and accuracy. Additionally, a novel loop closure detection method improves localization accuracy by mitigating cumulative errors. Experimental results demonstrate the system’s exceptional real-time performance, accuracy, and robustness in complex industrial scenarios, providing a promising solution for visual SLAM in industrial automation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 319-327"},"PeriodicalIF":3.9,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zixin Guo , Tzu-Jui Julius Wang , Selen Pehlivan , Abduljalil Radman , Min Cao , Jorma Laaksonen
{"title":"Prompt-based Weakly-supervised Vision-language Pre-training","authors":"Zixin Guo , Tzu-Jui Julius Wang , Selen Pehlivan , Abduljalil Radman , Min Cao , Jorma Laaksonen","doi":"10.1016/j.patrec.2025.06.020","DOIUrl":"10.1016/j.patrec.2025.06.020","url":null,"abstract":"<div><div>Weakly-supervised Vision-Language Pre-training (W-VLP) explores methods leveraging weak cross-modal supervision, typically relying on object tags generated by a pre-trained object detector (OD) from images. However, training such an OD necessitates dense cross-modal information, including images paired with numerous object-level annotations. To alleviate that requirement, this paper addresses W-VLP in two stages: (1) creating data with weaker cross-modal supervision and (2) pre-training a vision-language (VL) model with the created data. The data creation process involves collecting knowledge from large language models (LLMs) to describe images. Given a category label of an image, its descriptions generated by an LLM are used as the language counterpart. This knowledge supplements what can be obtained using an OD, such as spatial relationships among objects most likely appearing in a scene. To mitigate the noise in the LLM-generated descriptions that destabilizes the training process and may lead to overfitting, we incorporate knowledge distillation and external retrieval-augmented knowledge during pre-training. Furthermore, we present an effective VL model pre-trained with the created data. Empirically, despite its weaker cross-modal supervision, our pre-trained VL model notably outperforms other W-VLP works in image and text retrieval tasks, e.g., VLMixer by 17.7% on MSCOCO and RELIT by 11.25% on Flickr30K relatively in Recall@1 in text-to-image retrieval task. It also shows superior performance on other VL downstream tasks, making a big stride towards matching the performances of strongly supervised VLP models. The results reveal the effectiveness of the proposed W-VLP methodology.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 8-15"},"PeriodicalIF":3.9,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brij B. Gupta , Akshat Gaurav , Razaz Waheeb Attar , Varsha Arya , Ahmed Alhomoud
{"title":"Trusty Visual Intelligence Model for Leather Defect Detection Using ConvNeXtBase and Coyote Optimized Extra Tree","authors":"Brij B. Gupta , Akshat Gaurav , Razaz Waheeb Attar , Varsha Arya , Ahmed Alhomoud","doi":"10.1016/j.patrec.2025.06.019","DOIUrl":"10.1016/j.patrec.2025.06.019","url":null,"abstract":"<div><div>The leather industry continuously strives to ensure high product quality, yet defects often arise during stages like tanning, dyeing, and material handling. Traditional manual inspections are inconsistent, creating a need for automated, reliable visual intelligence systems. This paper introduces a Trusty Visual Intelligence Model for Leather Defect Detection Using ConvNeXtBase and Coyote Optimized Extra Tree. ConvNeXtBase is utilized for feature extraction, while an ExtraTreesClassifier, optimized with the Coyote Optimization Algorithm (COA), is employed for accurate defect classification, identifying issues like grain off, loose grains, and pinholes. Comparative analysis with models such as SVM, XGBoost, and LGBMClassifier demonstrates superior accuracy (0.90), precision, recall, and F1 score. The COA-optimized ExtraTreesClassifier is efficient and effective, making it ideal for real-time industrial applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 312-318"},"PeriodicalIF":3.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic canine emotion recognition through multimodal approach","authors":"Eliaf Garcia-Loya , Irvin Hussein Lopez-Nava , Humberto Pérez-Espinosa , Veronica Reyes-Meza , Mariel Urbina-Escalante","doi":"10.1016/j.patrec.2025.06.018","DOIUrl":"10.1016/j.patrec.2025.06.018","url":null,"abstract":"<div><div>This study introduces a comprehensive multimodal approach for analyzing and classifying emotions in dogs, combining visual, inertial, and physiological data to improve emotion recognition performance. The research focuses on the dimensions of valence and arousal to categorize dog emotions into four quadrants: playing, frustration, abandonment, and petting. A custom-developed device (PATITA) was used for synchronized data collection to which a feature extraction process based on windowing was done. Dimensionality reduction and feature selection techniques were applied to identified most relevant features across data types. Then, several unimodal and multimodal classification models, including Naïve Bayes, SVM, ExtraTrees, and kNN, were trained and evaluated. Experimental results demonstrated the superiority of the multimodal approach, with ExtraTrees classifier consistently yielding the best results (F1-score = 0.96), using the reduced feature set. In conclusion, this work presents a robust multimodal framework for canine emotion recognition, providing a foundation for future studies to refine techniques and overcome current limitations, particularly through more sophisticated models and expanded data collection.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 351-357"},"PeriodicalIF":3.3,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}