IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
CNN2GNN: How to Bridge CNN with GNN. CNN2GNN:如何将CNN与GNN连接起来。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI: 10.1109/TPAMI.2025.3583357
Ziheng Jiao, Hongyuan Zhang, Xuelong Li
{"title":"CNN2GNN: How to Bridge CNN with GNN.","authors":"Ziheng Jiao, Hongyuan Zhang, Xuelong Li","doi":"10.1109/TPAMI.2025.3583357","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3583357","url":null,"abstract":"<p><p>Thanks to extracting the intra-sample representation, the convolution neural network (CNN) has achieved excellent performance in vision tasks. However, its numerous convolutional layers take a higher training expense. Recently, graph neural networks (GNN), a bilinear model, have succeeded in exploring the underlying topological relationship among the graph data with a few graph neural layers. Unfortunately, due to the lack of graph structure and high-cost inference on large-scale scenarios, it cannot be directly utilized on non-graph data. Inspired by these complementary strengths and weaknesses, we discuss a natural question, how to bridge these two heterogeneous networks? In this paper, we propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. Firstly, to break the limitations of GNN, we design a differentiable sparse graph learning module as the head of the networks. It can dynamically learn the graph for inductive learning. Then, a response-based distillation is introduced to transfer the knowledge and bridge these two heterogeneous networks. Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of the distilled \"boosted\" two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers, such as ResNet152.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Feature Matters: A Framework for Diffusion Model Quantization. 时间特征问题:扩散模型量化的框架。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI: 10.1109/TPAMI.2025.3585692
Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao
{"title":"Temporal Feature Matters: A Framework for Diffusion Model Quantization.","authors":"Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao","doi":"10.1109/TPAMI.2025.3585692","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3585692","url":null,"abstract":"<p><p>Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues. However, unlike traditional models, diffusion models critically rely on the timestep for the multi-round denoising. Typically, each timestep is encoded into a hypersensitive temporal feature by several modules. Despite this, existing PTQ methods do not optimize these modules individually. Instead, they employ unsuitable reconstruction objectives and complex calibration methods, leading to significant disturbances in the temporal feature and denoising trajectory, as well as reduced compression efficiency. To address these challenges, we introduce a novel quantization framework that includes three strategies: 1) TIB-based Maintenance: Based on our innovative Temporal Information Block (TIB) definition, Temporal Information-aware Reconstruction (TIAR) and Finite Set Calibration (FSC) are developed to efficiently align original temporal features. 2) Cache-based Maintenance: Instead of indirect and complex optimization for the related modules, pre-computing and caching quantized counterparts of temporal features are developed to minimize errors. 3) Disturbance-aware Selection: Employ temporal feature errors to guide a fine-grained selection between the two maintenance strategies for further disturbance reduction. This framework preserves most of the temporal information and ensures high-quality end-to-end generation. Extensive testing on various datasets, diffusion models, and hardware confirms our superior performance and acceleration.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments. 增强具身主动防御:利用自适应交互在对抗的三维环境稳健的视觉感知。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI: 10.1109/TPAMI.2025.3585726
Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu
{"title":"Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments.","authors":"Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu","doi":"10.1109/TPAMI.2025.3585726","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3585726","url":null,"abstract":"<p><p>Adversarial attacks in 3D environments have emerged as a critical threat to the reliability of visual perception systems, particularly in safety-sensitive applications such as identity verification and autonomous driving. These attacks employ adversarial patches and 3D objects to manipulate deep neural network (DNN) predictions by exploiting vulnerabilities within complex scenes. Existing defense mechanisms, such as adversarial training and purification, primarily employ passive strategies to enhance robustness. However, these approaches often rely on pre-defined assumptions about adversarial tactics, limiting their adaptability in dynamic 3D settings. To address these challenges, we introduce Reinforced Embodied Active Defense (REIN-EAD), a proactive defense framework that leverages adaptive exploration and interaction with the environment to improve perception robustness in 3D adversarial contexts. By implementing a multi-step objective that balances immediate prediction accuracy with predictive entropy minimization, REIN-EAD optimizes defense strategies over a multi-step horizon. Additionally, REIN-EAD involves an uncertainty-oriented reward-shaping mechanism that facilitates efficient policy updates, thereby reducing computational overhead and supporting real-world applicability without the need for differentiable environments. Comprehensive experiments validate the effectiveness of REIN-EAD, demonstrating a substantial reduction in attack success rates while preserving standard accuracy across diverse tasks. Notably, REIN-EAD exhibits robust generalization to unseen and adaptive attacks, making it suitable for real-world complex tasks, including 3D object classification, face recognition and autonomous driving. By integrating proactive policy learning with embodied scene interaction, REIN-EAD establishes a scalable and adaptable approach for securing DNN-based perception systems in dynamic and adversarial 3D environments.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ComS2T: A Complementary Spatiotemporal Learning System for Data-Adaptive Model Evolution. ComS2T:一种数据自适应模型演化的互补时空学习系统。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-05 DOI: 10.1109/TPAMI.2025.3576805
Zhengyang Zhou, Qihe Huang, Binwu Wang, Jianpeng Hou, Kuo Yang, Yuxuan Liang, Yu Zheng, Yang Wang
{"title":"ComS2T: A Complementary Spatiotemporal Learning System for Data-Adaptive Model Evolution.","authors":"Zhengyang Zhou, Qihe Huang, Binwu Wang, Jianpeng Hou, Kuo Yang, Yuxuan Liang, Yu Zheng, Yang Wang","doi":"10.1109/TPAMI.2025.3576805","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3576805","url":null,"abstract":"<p><p>Spatiotemporal (ST) learning has become a crucial technique to enable smart cities and sustainable urban development. Current ST learning models capture the heterogeneity via various spatial convolution and temporal evolution blocks. However, rapid urbanization leads to fluctuating distributions in urban data and city structures, resulting in existing methods suffering generalization and data adaptation issues. Despite efforts, existing methods fail to deal with newly arrived observations, and the limitation of those methods with generalization capacity lies in the repeated training that leads to inconvenience, inefficiency and resource waste. Motivated by complementary learning in neuroscience, we introduce a prompt-based complementary spatiotemporal learning termed ComS2T, to empower the evolution of models for data adaptation. We first disentangle the neural architecture into two disjoint structures, a stable neocortex for consolidating historical memory, and a dynamic hippocampus for new knowledge update. Then we train the dynamic spatial and temporal prompts by characterizing distribution of main observations to enable prompts adaptive to new data. This data-adaptive prompt mechanism, combined with a two-stage training process, facilitates fine-tuning of the neural architecture conditioned on prompts, thereby enabling efficient adaptation during testing. Extensive experiments validate the efficacy of ComS2T in adapting various spatiotemporal out-of-distribution scenarios while maintaining effective inferences. The code is available on https://github.com/hqh0728/ComS2T.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: Introduction to the Special Section on Large-Scale Multimodal Learning: Universality, Robustness, Efficiency, and Beyond 嘉宾评论:大规模多模态学习专题导论:普适性、稳健性、效率及超越
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-05 DOI: 10.1109/TPAMI.2025.3562938
Peng Xu;Song Bai;Bowen Zhou;David Clifton;Andrea Vedaldi;Mihaela van der Schaar;Luc Van Gool
{"title":"Guest Editorial: Introduction to the Special Section on Large-Scale Multimodal Learning: Universality, Robustness, Efficiency, and Beyond","authors":"Peng Xu;Song Bai;Bowen Zhou;David Clifton;Andrea Vedaldi;Mihaela van der Schaar;Luc Van Gool","doi":"10.1109/TPAMI.2025.3562938","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3562938","url":null,"abstract":"","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5127-5129"},"PeriodicalIF":0.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11026038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving DeepInteraction++:自动驾驶的多模态交互
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-04-29 DOI: 10.1109/TPAMI.2025.3565194
Zeyu Yang;Nan Song;Wei Li;Xiatian Zhu;Li Zhang;Philip H.S. Torr
{"title":"DeepInteraction++: Multi-Modality Interaction for Autonomous Driving","authors":"Zeyu Yang;Nan Song;Wei Li;Xiatian Zhu;Li Zhang;Philip H.S. Torr","doi":"10.1109/TPAMI.2025.3565194","DOIUrl":"10.1109/TPAMI.2025.3565194","url":null,"abstract":"Existing top-performance autonomous driving systems typically rely on the <italic>multi-modal fusion</i> strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and finally hampering the model performance. To address this limitation, in this work, we introduce a novel <italic>modality interaction</i> strategy that allows individual per-modality representations to be learned and maintained throughout, enabling their unique characteristics to be exploited during the whole perception pipeline. To demonstrate the effectiveness of the proposed strategy, we design <italic>DeepInteraction++</i>, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Specifically, the encoder is implemented as a dual-stream Transformer with specialized attention operation for information exchange and integration between separate modality-specific representations. Our multi-modal representational learning incorporates both object-centric, precise sampling-based feature alignment and global dense information spreading, essential for the more challenging planning task. The decoder is designed to iteratively refine the predictions by alternately aggregating information from separate representations in a unified modality-agnostic manner, realizing multi-modal predictive interaction. Extensive experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6749-6763"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Positive-Unlabeled Classification From Corrupted Data in GANs gan中损坏数据的正无标记分类
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-04-29 DOI: 10.1109/TPAMI.2025.3565394
Yunke Wang;Chang Xu;Tianyu Guo;Bo Du;Dacheng Tao
{"title":"On Positive-Unlabeled Classification From Corrupted Data in GANs","authors":"Yunke Wang;Chang Xu;Tianyu Guo;Bo Du;Dacheng Tao","doi":"10.1109/TPAMI.2025.3565394","DOIUrl":"10.1109/TPAMI.2025.3565394","url":null,"abstract":"This paper defines a positive and unlabeled classification problem for standard GANs, which then leads to a novel technique to stabilize the training of the discriminator in GANs and deal with corrupted data. Traditionally, real data are taken as positive while generated data are negative. This positive-negative classification criterion was kept fixed all through the learning process of the discriminator without considering the gradually improved quality of generated data, even if they could be more realistic than real data at times. In contrast, it is more reasonable to treat the generated data as unlabeled, which could be positive or negative according to their quality. The discriminator is thus a classifier for this positive and unlabeled classification problem, and we derive a new Positive-Unlabeled GAN (PUGAN). We theoretically discuss the global optimality the proposed model will achieve and the equivalent optimization goal. Empirically, we find that PUGAN can achieve comparable or even better performance than those sophisticated discriminator stabilization methods. Considering the potential corrupted data problem in real-world scenarios, we further extend our approach to PUGAN-C, which treats real data as unlabeled that accounts for both clean and corrupted instances, and generated data as positive. The samples from generator could be closer to those corrupted data within unlabeled data at first, but within the framework of adversarial training, the generator will be optimized to cheat the discriminator and produce samples that are similar to those clean data. Experimental results on image generation from several corrupted datasets demonstrate the effectiveness and generalization of PUGAN-C.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6859-6875"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Every Object From Events 从事件中检测每个对象
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-04-28 DOI: 10.1109/TPAMI.2025.3565102
Haitian Zhang;Chang Xu;Xinya Wang;Bingde Liu;Guang Hua;Lei Yu;Wen Yang
{"title":"Detecting Every Object From Events","authors":"Haitian Zhang;Chang Xu;Xinya Wang;Bingde Liu;Guang Hua;Lei Yu;Wen Yang","doi":"10.1109/TPAMI.2025.3565102","DOIUrl":"10.1109/TPAMI.2025.3565102","url":null,"abstract":"Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on RGB cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks under extreme conditions like fast-moving objects, overexposure, and darkness. In this study, we turn to the event-based vision, featured by its sub-millisecond latency and high dynamic range, for robust CAOD. We propose Detecting Every Object in Events (DEOE), an approach aimed at achieving high-speed, class-agnostic object detection in event-based vision. Built upon the fast event-based backbone: recurrent vision transformer, we jointly consider the spatial and temporal consistencies to identify potential objects. The discovered potential objects are assimilated as soft positive samples to avoid being suppressed as backgrounds. Moreover, we introduce a disentangled objectness head to separate the foreground-background classification and novel object discovery tasks, enhancing the model's generalization in localizing novel objects while maintaining a strong ability to filter out the background. Extensive experiments confirm the superiority of our proposed DEOE in both open-set and closed-set settings, outperforming strong baseline methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"7171-7178"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly Supervised Micro- and Macro-Expression Spotting Based on Multi-Level Consistency 基于多级一致性的弱监督宏、微观表达式定位
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-04-28 DOI: 10.1109/TPAMI.2025.3564951
Wang-Wang Yu;Kai-Fu Yang;Hong-Mei Yan;Yong-Jie Li
{"title":"Weakly Supervised Micro- and Macro-Expression Spotting Based on Multi-Level Consistency","authors":"Wang-Wang Yu;Kai-Fu Yang;Hong-Mei Yan;Yong-Jie Li","doi":"10.1109/TPAMI.2025.3564951","DOIUrl":"10.1109/TPAMI.2025.3564951","url":null,"abstract":"Most micro- and macro-expression spotting methods in untrimmed videos suffer from the burden of video-wise collection and frame-wise annotation. Weakly supervised expression spotting (WES) based on video-level labels can potentially mitigate the complexity of frame-level annotation while achieving fine-grained frame-level spotting. However, we argue that existing weakly supervised methods are based on multiple instance learning (MIL) involving inter-modality, inter-sample, and inter-task gaps. The inter-sample gap is primarily from the sample distribution and duration. Therefore, we propose a novel and simple WES framework, MC-WES, using multi-consistency collaborative mechanisms that include modal-level saliency, video-level distribution, label-level duration and segment-level feature consistency strategies to implement fine frame-level spotting with only video-level labels to alleviate the above gaps and merge prior knowledge. The modal-level saliency consistency strategy focuses on capturing key correlations between raw images and optical flow. The video-level distribution consistency strategy utilizes the difference of sparsity in temporal distribution. The label-level duration consistency strategy exploits the difference in the duration of facial muscles. The segment-level feature consistency strategy emphasizes that features under the same labels maintain similarity. Experimental results on three challenging datasets–CAS(ME)<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>, CAS(ME)<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>, and SAMM-LV–demonstrate that MC-WES is comparable to state-of-the-art fully supervised methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6912-6928"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Normalized-Full-Palmar-Hand: Toward More Accurate Hand-Based Multimodal Biometrics 规范化全手掌:迈向更准确的基于手的多模态生物识别
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-04-28 DOI: 10.1109/TPAMI.2025.3564514
Yitao Qiao;Wenxiong Kang;Dacan Luo;Junduan Huang
{"title":"Normalized-Full-Palmar-Hand: Toward More Accurate Hand-Based Multimodal Biometrics","authors":"Yitao Qiao;Wenxiong Kang;Dacan Luo;Junduan Huang","doi":"10.1109/TPAMI.2025.3564514","DOIUrl":"10.1109/TPAMI.2025.3564514","url":null,"abstract":"Hand-based multimodal biometrics have attracted significant attention due to their high security and performance. However, existing methods fail to adequately decouple various hand biometric traits, limiting the extraction of unique features. Moreover, effective feature extraction for multiple hand traits remains a challenge. To address these issues, we propose a novel method for the precise decoupling of hand multimodal features called ‘Normalized-Full-Palmar-Hand’ and construct an authentication system based on this method. First, we propose HSANet, which accurately segments various hand regions with diverse backgrounds based on low-level details and high-level semantic information. Next, we establish two hand multimodal biometric databases with HSANet: SCUT Normalized-Full-Palmar-Hand Database Version 1 (SCUT_NFPH_v1) and Version 2 (SCUT_NFPH_v2). These databases include full hand images, semantic masks, and images of various hand biometric traits obtained from the same individual at the same scale, totaling 157,500 images. Third, we propose the Full Palmar Hand Authentication Network framework (FPHandNet) to extract unique features of multiple hand biometric traits. Finally, extensive experimental results, performed via the publicly available CASIA, IITD, COEP databases, and our proposed databases, validate the effectiveness of our methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6715-6730"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信