Federated micro-expression mining and multi-modal metadata fusion for Deepfake fraud detection in ubiquitous financial video-KYC systems at IoT network

Franklin Open Pub Date : 2026-03-01 Epub Date: 2026-02-04 DOI:10.1016/j.fraope.2026.100523

Romil Rawat , Anjali Rawat , Shweta Gupta , A. Samson Arun Raj , T.M. Thiyagu , Hitesh Rawat , Anand Rajavat

{"title":"Federated micro-expression mining and multi-modal metadata fusion for Deepfake fraud detection in ubiquitous financial video-KYC systems at IoT network","authors":"Romil Rawat , Anjali Rawat , Shweta Gupta , A. Samson Arun Raj , T.M. Thiyagu , Hitesh Rawat , Anand Rajavat","doi":"10.1016/j.fraope.2026.100523","DOIUrl":null,"url":null,"abstract":"<div><div>Introduction & Problem Statement- The increasing sophistication of AI-generated deepfakes poses significant challenges for financial video-KYC systems, where identity verification relies on accurate and real-time analysis of user biometrics. Traditional centralized and unimodal detection models struggle to balance accuracy, privacy, and deployment scalability, particularly across heterogeneous IoT edge devices. Need for Research-There is a pressing need for privacy-preserving, scalable, and robust deepfake detection mechanisms capable of identifying subtle manipulations in real-world financial environments. Current solutions often fail under domain-shift conditions, low-resolution inputs, or in scenarios involving complex micro-expression and behavioral cues. Proposed Work & Objective- This research proposes the Federated Micro-Expression Mining and Multi-Modal Metadata Fusion (FED-MEMF) framework, designed to accurately detect deepfake fraud in decentralized video-KYC systems. The objectives are to (i) enhance detection accuracy by leveraging facial micro-expression dynamics, audio signals, and session metadata, and (ii) preserve user privacy through federated learning while ensuring low-latency real-time inference. Novelty- The novelty lies in integrating fine-grained micro-expression analysis with behavioral metadata fusion in a federated learning environment, combined with cross-modal attention mechanisms. This approach enables robust detection across multiple datasets while maintaining privacy and edge-device compatibility. Method- The framework employs modality-specific encoders—μ-Transformer for micro-expressions, CNN for audio, and LSTM for metadata—with features fused via a cross-modal attention engine. Federated Averaging (FedAvg) aggregates local model updates from IoT edge devices without transferring sensitive data. Quantization and hardware optimizations enable real-time performance on low-power devices. Dataset- Experiments utilized FaceForensics++, CAS(ME)^2, and a proprietary KYC-FinVox2024 dataset comprising video, audio, and metadata streams, including micro-expression labels, to evaluate both intra- and cross-dataset performance. Results- FED-MEMF achieved an overall accuracy of 98.7%, F1-score of 0.987, AUC of 0.996, and inference latency of 82 ms, outperforming XceptionNet, EfficientNet-B4, and CNN+LSTM baselines. Multi-modal fusion significantly reduced false positives and false negatives, demonstrating robustness under domain-shift conditions. Conclusion & Future Work- FED-MEMF provides a privacy-conscious, real-time, and scalable solution for deepfake detection in financial video-KYC applications. Future directions include multilingual audio-visual alignment, blockchain-enabled federated auditing, explainable AI integration, and deployment in other regulatory-sensitive sectors such as e-governance, healthcare, and remote education verification.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"14 ","pages":"Article 100523"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186326000393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction & Problem Statement- The increasing sophistication of AI-generated deepfakes poses significant challenges for financial video-KYC systems, where identity verification relies on accurate and real-time analysis of user biometrics. Traditional centralized and unimodal detection models struggle to balance accuracy, privacy, and deployment scalability, particularly across heterogeneous IoT edge devices. Need for Research-There is a pressing need for privacy-preserving, scalable, and robust deepfake detection mechanisms capable of identifying subtle manipulations in real-world financial environments. Current solutions often fail under domain-shift conditions, low-resolution inputs, or in scenarios involving complex micro-expression and behavioral cues. Proposed Work & Objective- This research proposes the Federated Micro-Expression Mining and Multi-Modal Metadata Fusion (FED-MEMF) framework, designed to accurately detect deepfake fraud in decentralized video-KYC systems. The objectives are to (i) enhance detection accuracy by leveraging facial micro-expression dynamics, audio signals, and session metadata, and (ii) preserve user privacy through federated learning while ensuring low-latency real-time inference. Novelty- The novelty lies in integrating fine-grained micro-expression analysis with behavioral metadata fusion in a federated learning environment, combined with cross-modal attention mechanisms. This approach enables robust detection across multiple datasets while maintaining privacy and edge-device compatibility. Method- The framework employs modality-specific encoders—μ-Transformer for micro-expressions, CNN for audio, and LSTM for metadata—with features fused via a cross-modal attention engine. Federated Averaging (FedAvg) aggregates local model updates from IoT edge devices without transferring sensitive data. Quantization and hardware optimizations enable real-time performance on low-power devices. Dataset- Experiments utilized FaceForensics++, CAS(ME)^2, and a proprietary KYC-FinVox2024 dataset comprising video, audio, and metadata streams, including micro-expression labels, to evaluate both intra- and cross-dataset performance. Results- FED-MEMF achieved an overall accuracy of 98.7%, F1-score of 0.987, AUC of 0.996, and inference latency of 82 ms, outperforming XceptionNet, EfficientNet-B4, and CNN+LSTM baselines. Multi-modal fusion significantly reduced false positives and false negatives, demonstrating robustness under domain-shift conditions. Conclusion & Future Work- FED-MEMF provides a privacy-conscious, real-time, and scalable solution for deepfake detection in financial video-KYC applications. Future directions include multilingual audio-visual alignment, blockchain-enabled federated auditing, explainable AI integration, and deployment in other regulatory-sensitive sectors such as e-governance, healthcare, and remote education verification.

查看原文本刊更多论文

联合微表情挖掘和多模态元数据融合在物联网无处不在的金融视频kyc系统中进行Deepfake欺诈检测

导言&问题陈述-人工智能生成的深度伪造技术日益复杂，这对金融视频kyc系统构成了重大挑战，因为身份验证依赖于对用户生物特征的准确和实时分析。传统的集中式和单峰式检测模型难以平衡准确性、隐私性和部署可扩展性，特别是在异构物联网边缘设备之间。研究需求——迫切需要能够识别现实金融环境中微妙操纵的隐私保护、可扩展和强大的深度伪造检测机制。当前的解决方案通常在域移位条件下、低分辨率输入或涉及复杂微表情和行为线索的场景下失败。本研究提出了联邦微表情挖掘和多模态元数据融合（FED-MEMF）框架，旨在准确检测分散视频kyc系统中的深度欺诈。目标是：(i)通过利用面部微表情动态、音频信号和会话元数据来提高检测准确性，以及（ii）通过联邦学习保护用户隐私，同时确保低延迟实时推断。新颖性——新颖性在于将细粒度微表情分析与联邦学习环境中的行为元数据融合结合在一起，并结合跨模态注意机制。这种方法可以实现跨多个数据集的强大检测，同时保持隐私和边缘设备兼容性。方法-该框架采用特定于模态的编码器——μ- transformer用于微表情，CNN用于音频，LSTM用于元数据——并通过跨模态注意力引擎融合特征。联邦平均（FedAvg）在不传输敏感数据的情况下聚合来自物联网边缘设备的本地模型更新。量化和硬件优化可以在低功耗设备上实现实时性能。数据集-实验使用facefrensics ++， CAS(ME)^2和专有的KYC-FinVox2024数据集，包括视频，音频和元数据流，包括微表情标签，以评估内部和跨数据集的性能。结果- federal - memf的总体准确率为98.7%，f1得分为0.987，AUC为0.996，推理延迟为82 ms，优于XceptionNet、EfficientNet-B4和CNN+LSTM基线。多模态融合显著减少了假阳性和假阴性，在域移位条件下表现出鲁棒性。未来工作- FED-MEMF为金融视频kyc应用中的深度假检测提供了一种具有隐私意识、实时和可扩展的解决方案。未来的方向包括多语言视听校准、支持区块链的联邦审计、可解释的人工智能集成，以及在其他监管敏感领域（如电子政务、医疗保健和远程教育验证）的部署。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Franklin Open

自引率

0.00%

发文量