IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2024-12-18 DOI:10.1109/TIFS.2024.3520015

Yuhang Qiu;Honghui Chen;Xingbo Dong;Zheng Lin;Iman Yi Liao;Massimo Tistarelli;Zhe Jin

{"title":"IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer","authors":"Yuhang Qiu;Honghui Chen;Xingbo Dong;Zheng Lin;Iman Yi Liao;Massimo Tistarelli;Zhe Jin","doi":"10.1109/TIFS.2024.3520015","DOIUrl":null,"url":null,"abstract":"Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"559-573"},"PeriodicalIF":6.3000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10806774/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.

查看原文本刊更多论文

IFViT：通过视觉转换器进行指纹匹配的可解释固定长度表示法

确定用于构建深度固定长度表示以进行准确匹配的指纹上的密集特征点，特别是在像素级，是一个非常重要的问题。为了探索指纹匹配的可解释性，我们提出了一种多阶段可解释指纹匹配网络，即可解释的固定长度表示指纹匹配通过视觉变压器（IFViT），它由两个主要模块组成。第一个模块是一个可解释的密集配准模块，它建立了一个基于视觉转换（Vision Transformer, ViT）的Siamese网络，以捕获指纹对中的远程依赖关系和全局上下文。它为指纹对齐提供了可解释的密集像素对应特征点，并增强了后续匹配阶段的可解释性。第二个模块考虑了对齐指纹对的局部和全局表示，以实现可解释的固定长度表示的提取和匹配。它使用在第一个模块中训练的vit和额外的全连接层，并对它们进行重新训练，同时产生判别定长表示和特征点的可解释的密集像素对应。在多种公开指纹数据库上的大量实验结果表明，该框架不仅在密集配准和匹配方面表现出优异的性能，而且显著提高了基于深度定长表示的指纹匹配的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features