Arnab Kumar Das , Aritra Bose , Priya Manohar , Anurag Dutta , Ruchira Naskar , Rajat Subhra Chakraborty
{"title":"InDeepFake: A novel multimodal multilingual indian deepfake video dataset","authors":"Arnab Kumar Das , Aritra Bose , Priya Manohar , Anurag Dutta , Ruchira Naskar , Rajat Subhra Chakraborty","doi":"10.1016/j.patrec.2025.07.002","DOIUrl":"10.1016/j.patrec.2025.07.002","url":null,"abstract":"<div><div>Recent advancements in Generative AI have resulted in decline of online digital contents credibility, at all levels of the human society. In spite of numerous discussions in popular media on the grave risks exposed by deepfakes and the relative lack of human awareness, deepfake based illegal activities are on the rise all over the world. India as a nation has seen rapid surge in deepfake cases reported in recent times, with news channels and media flooded with cases of financial fraudulence, personal vendetta, and false political propaganda, especially before the national and state elections. This can prove detrimental against the democratic future of the nation, indicating a serious need for efficient deepfake detectors in the coming days, tailored to investigate and solve Indian deepfake cases. The task is particularly challenging given the great linguistic and ethnic diversity of India. Based on this motivation, in our work, we develop an extensive deepfake dataset for the Indian population. To the best of our knowledge, this is the first such effort that is reported. We have developed a multimodal audio–video deepfake dataset, in seven major Indian languages, and seven state-of-the-art (SOTA) deepfake generators, covering a wide range of age and gender diversity. We evaluated SOTA detector results on the proposed dataset, to highlight its relevance in furthering multimodal deepfake research. We have open-sourced the dataset and code to implement the baseline methods at: <span><span>https://github.com/arnabdasphd/InDeepFake</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 16-23"},"PeriodicalIF":3.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144662803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Document level Relationship Extraction based on context feature enhancement","authors":"Nan Zhang, Ziming Cui, Qiang Cai","doi":"10.1016/j.patrec.2025.07.006","DOIUrl":"10.1016/j.patrec.2025.07.006","url":null,"abstract":"<div><div>Document level Relationship Extraction (DocRE) tasks aim to extract relationships between multiple entities from long texts. However, obtaining feature representations for entity pairs that span multiple sentences is a challenge. Additionally, the feature information for triplets depends on both intra-document and inter-sentence information. To address this issue, this paper proposes a model named Plus-DocRE for DocRE(PDRE). Firstly, we introduce entity segmentation based on spans to increase the potential number of entities and improve negative sample recognition. Secondly, we utilize the BERT pre-trained model to obtain paragraph and local context information, enriching the features of entity pairs. Finally, through linear layers and self-attention mechanisms, we fuse the features of local and paragraph context for multi-label relationship classification, enabling entity relationship extraction. Meanwhile, we introduce a new data mechanism, C-DocRE, to simulate a more realistic scenario with annotation errors. Experimental results show that the PDRE model outperforms other baseline models in performance, achieving an F1 score of 53.6.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 24-30"},"PeriodicalIF":3.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144662804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruiguo Yu , Yiyang Zhang , Yuan Tian , Yujie Diao , Di Jin , Witold Pedrycz
{"title":"MAMBO-NET: Multi-causal aware modeling backdoor-intervention optimization for medical image segmentation network","authors":"Ruiguo Yu , Yiyang Zhang , Yuan Tian , Yujie Diao , Di Jin , Witold Pedrycz","doi":"10.1016/j.patrec.2025.06.016","DOIUrl":"10.1016/j.patrec.2025.06.016","url":null,"abstract":"<div><div>Medical image segmentation methods generally assume that the process from medical image to segmentation is unbiased, and use neural networks to establish conditional probability models to complete the segmentation task. This assumption does not consider confusion factors, which can affect medical images, such as complex anatomical variations and imaging modality limitations. Confusion factors obfuscate the relevance and causality of medical image segmentation, leading to unsatisfactory segmentation results. To address this issue, we propose a multi-causal aware modeling backdoor-intervention optimization (MAMBO-NET) network for medical image segmentation. Drawing insights from causal inference, MAMBO-NET utilizes self-modeling with multi-Gaussian distributions to fit the confusion factors and introduce causal intervention into the segmentation process. Moreover, we design appropriate posterior probability constraints to effectively train the distributions of confusion factors. For the distributions to effectively guide the segmentation and mitigate and eliminate the impact of confusion factors on the segmentation, we introduce classical backdoor intervention techniques and analyze their feasibility in the segmentation task. Experiments on five medical image datasets demonstrate a maximum improvement of 2.28% in Dice score on three ultrasound datasets, with false discovery rate reduced by 1.49% and 1.87% for dermatoscopy and colonoscopy datasets respectively, indicating broad applicability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 102-109"},"PeriodicalIF":3.3,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust multimodal face anti-spoofing via frequency-domain feature refinement and aggregation","authors":"Rui Sun, Fei Wang, Xiaolu Yu, Xinjian Gao, Xudong Zhang","doi":"10.1016/j.patrec.2025.07.003","DOIUrl":"10.1016/j.patrec.2025.07.003","url":null,"abstract":"<div><div>The existing face anti-spoofing (FAS) methods face two main problems in practical applications: (1) single visible light modality may fail in low-light conditions; (2) insufficient consideration of noise interference. These problems limit the potential application of FAS models in real-world scenarios. To enhance the model’s robustness against environmental changes, we propose a multimodal FAS method that incorporates the frequency domain feature refinement and multi-stage aggregation. Specifically, during the feature extraction process, we utilize wavelet transform to selectively refine and reorganize high and low-frequency features. Additionally, we designed an RGB modality-guided feature interaction fusion module, where the fused features at different stages progressively improve the final discriminative features. The final results indicate that our method achieves excellent performance across multiple public datasets. Furthermore, we conducted experiments by randomly adding noise to the WMCA and CASIA-SURF datasets, and the results demonstrate that our method effectively leverages frequency information to maintain robustness against noise interference, also performing exceptionally well when handling low-quality images.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 31-36"},"PeriodicalIF":3.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144662805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Domain Few-Shot 3D Point Cloud Semantic Segmentation","authors":"Jiwei Xiao, Ruiping Wang, Chen He, Xilin Chen","doi":"10.1016/j.patrec.2025.07.001","DOIUrl":"10.1016/j.patrec.2025.07.001","url":null,"abstract":"<div><div>Training fully supervised 3D point cloud semantic segmentation models is hindered by the need for extensive datasets and expensive annotation, limiting rapid expansion to additional categories. In response to these challenges, Few-Shot 3D Point Cloud Semantic Segmentation (3D FS-SSeg) methods utilize less labeled scene data to generalize to new categories. However, these approaches still depend on laboriously annotated semantic labels in 3D scenes. To address this limitation, we propose a more practical task named Cross-Domain Few-Shot 3D Point Cloud Semantic Segmentation (3D CD-FS-SSeg). In this task, we expand the model’s ability to segment point clouds of novel classes in unknown scenes by leveraging a small amount of low-cost CAD object model data or RGB-D image data as a support set. To accomplish the above task, we propose an approach that consists of two main blocks: a Cross Domain Adaptation (CDA) module that transfers the contextual information of the query scene to the support object to reduce the cross-domain gap, and a Multiple Prototypes Discriminative (MPD) loss that enhances inter-class variation while reducing intra-class variation. Experimental results on the ScanNet and S3DIS datasets demonstrate that our proposed method provides a significant improvement on the 3D CD-FS-SSeg benchmark.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 51-57"},"PeriodicalIF":3.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xue Xian Zheng , Bilal Taha , Muhammad Mahboob Ur Rahman , Mudassir Masood , Dimitrios Hatzinakos , Tareq Al-Naffouri
{"title":"Multimodal biometric authentication using camera-based PPG and fingerprint fusion","authors":"Xue Xian Zheng , Bilal Taha , Muhammad Mahboob Ur Rahman , Mudassir Masood , Dimitrios Hatzinakos , Tareq Al-Naffouri","doi":"10.1016/j.patrec.2025.06.017","DOIUrl":"10.1016/j.patrec.2025.06.017","url":null,"abstract":"<div><div>This paper presents a multimodal biometric system fusing photoplethysmography (PPG) signals and fingerprints for robust human verification. Instead of relying on heterogeneous biosensors, the PPG signals and fingerprints are both obtained through video recordings from a smartphone’s camera, as users place their fingers on the lens. To capture the unique characteristics of each user, we propose a homogeneous neural network consisting of two structured state space model (SSM) encoders to handle the distinct modalities. Specifically, the fingerprint images are flattened into sequences of pixels, which, along with segmented PPG beat waveforms, are fed into the encoders. This is followed by a cross-modal attention mechanism to learn more nuanced feature representations. Furthermore, their feature distributions are aligned within a unified latent space, utilizing a distribution-oriented contrastive loss. This alignment facilitates the learning of intrinsic and transferable intermodal relationships, thereby improving the system’s performance with unseen data. Experimental results on the datasets collected for this study demonstrate the superiority of the proposed approach, validated across a broad range of evaluation metrics in both single-session and two-session authentication scenarios. The system achieved an accuracy of 100% and an equal error rate (EER) of 0.1% for single-session data, and an accuracy of 94.3% and an EER of 6.9% for two-session data.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bidirectional two-dimensional supervised multiset canonical correlation analysis for multi-view feature extraction","authors":"Jing Yang , Liya Fan , Quansen Sun , Xizhan Gao","doi":"10.1016/j.patrec.2025.06.024","DOIUrl":"10.1016/j.patrec.2025.06.024","url":null,"abstract":"<div><div>Bidirectional two-dimensional multiset canonical correlation analysis ((2D)<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>MCCA) studies the linear correlation between multiple datasets while not requiring vectorization of the image matrix. However, it does not use the class label information in the data during feature extraction. In order to fully utilize the class label information for feature extraction, this letter proposes a new method called bidirectional two-dimensional supervised multiset canonical correlation analysis ((2D)<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SMCCA). The basic idea of (2D)<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>SMCCA is to replace the equality constraints with the inequality constraints, maximizing the correlation of multiple sets of data while maximizing the inter-class scatter and minimizing the intra-class scatter for intra-group data. Experiments on face image and object image databases show that the proposed method has good recognition performance, while the extracted features have strong discriminative ability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 37-43"},"PeriodicalIF":3.9,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144670867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhendong Guo , Na Dong , Shuai Liu , Donghui Li , Wai Hung Ip , Kai Leung Yung
{"title":"Application and optimization of lightweight visual SLAM in dynamic industrial environment","authors":"Zhendong Guo , Na Dong , Shuai Liu , Donghui Li , Wai Hung Ip , Kai Leung Yung","doi":"10.1016/j.patrec.2025.06.021","DOIUrl":"10.1016/j.patrec.2025.06.021","url":null,"abstract":"<div><div>With the increasing adoption of visual SLAM in industrial automation, maintaining real-time performance and robustness in dynamic environments presents a significant challenge. Traditional SLAM systems often struggle with interference from moving objects and real-time processing on resource-constrained devices, resulting in accuracy issues. This paper introduces a lightweight object detection algorithm that employs spatial-channel decoupling for efficient removal of dynamic objects. It utilizes Region-Adaptive Deformable Convolution (RAD-Conv) to minimize computational complexity and incorporates a lightweight Convolutional Neural Network(CNN) architecture to enhance real-time performance and accuracy. Additionally, a novel loop closure detection method improves localization accuracy by mitigating cumulative errors. Experimental results demonstrate the system’s exceptional real-time performance, accuracy, and robustness in complex industrial scenarios, providing a promising solution for visual SLAM in industrial automation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 319-327"},"PeriodicalIF":3.9,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryem Bouali , Fakhreddine Ababsa , Muhammad Ali Sammuneh , Rani El Meouche , Bahar Salavati , Flavien Viguier
{"title":"Enhanced railway sinkhole detection and characterization using LiDAR-based 3D modeling and geometric analysis","authors":"Maryem Bouali , Fakhreddine Ababsa , Muhammad Ali Sammuneh , Rani El Meouche , Bahar Salavati , Flavien Viguier","doi":"10.1016/j.patrec.2025.06.015","DOIUrl":"10.1016/j.patrec.2025.06.015","url":null,"abstract":"<div><div>This paper addresses the complex problem of modeling sinkholes in railway infrastructure using LiDAR (Light Detection and Ranging) point cloud data. Building upon previous research on sinkhole detection with Digital Elevation Models (DEMs), this study evaluates multiple geometric models — an inverted Gaussian, conical, and cylindrical — to determine their effectiveness in accurately representing sinkhole structures. The inverted Gaussian model emerged as the most accurate. The study focused on optimizing the model by identifying and fine-tuning the parameters that most significantly influence detection performance. This led to substantial improvements in detection accuracy, with precision increasing from 55% to 61%, knowing that the recall reached 90%. The proposed methodology enhances railway safety by providing precise, scalable, and robust sinkhole detection, with future improvements aimed at generating synthetic datasets for deep learning model training, further advancing detection capabilities.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 378-384"},"PeriodicalIF":3.3,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}