IrisFormer: A Dedicated Transformer Framework for Iris Recognition

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2024-12-26 DOI:10.1109/LSP.2024.3522856

Xianyun Sun;Caiyong Wang;Yunlong Wang;Jianze Wei;Zhenan Sun

{"title":"IrisFormer: A Dedicated Transformer Framework for Iris Recognition","authors":"Xianyun Sun;Caiyong Wang;Yunlong Wang;Jianze Wei;Zhenan Sun","doi":"10.1109/LSP.2024.3522856","DOIUrl":null,"url":null,"abstract":"While Vision Transformer (ViT)-based methods have significantly improved the performance of various vision tasks in natural scenes, progress in iris recognition remains limited. In addition, the human iris contains unique characters that are distinct from natural scenes. To remedy this, this paper investigates a dedicated Transformer framework, termed IrisFormer, for iris recognition and attempts to improve the accuracy by combining the contextual modeling ability of ViT and iris-specific optimization to learn robust, fine-grained, and discriminative features. Specifically, to achieve rotation invariance in iris recognition, we employ relative position encoding instead of regular absolute position encoding for each iris image token, and a horizontal pixel-shifting strategy is utilized during training for data augmentation. Then, to enhance the model's robustness against local distortions such as occlusions and reflections, we randomly mask some tokens during training to force the model to learn representative identity features from only part of the image. Finally, considering that fine-grained features are more discriminative in iris recognition, we retain the entire token sequence for patch-wise feature matching instead of using the standard single classification token. Experiments on three popular datasets demonstrate that the proposed framework achieves competitive performance under both intra- and inter-dataset testing protocols.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"431-435"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10816462/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

While Vision Transformer (ViT)-based methods have significantly improved the performance of various vision tasks in natural scenes, progress in iris recognition remains limited. In addition, the human iris contains unique characters that are distinct from natural scenes. To remedy this, this paper investigates a dedicated Transformer framework, termed IrisFormer, for iris recognition and attempts to improve the accuracy by combining the contextual modeling ability of ViT and iris-specific optimization to learn robust, fine-grained, and discriminative features. Specifically, to achieve rotation invariance in iris recognition, we employ relative position encoding instead of regular absolute position encoding for each iris image token, and a horizontal pixel-shifting strategy is utilized during training for data augmentation. Then, to enhance the model's robustness against local distortions such as occlusions and reflections, we randomly mask some tokens during training to force the model to learn representative identity features from only part of the image. Finally, considering that fine-grained features are more discriminative in iris recognition, we retain the entire token sequence for patch-wise feature matching instead of using the standard single classification token. Experiments on three popular datasets demonstrate that the proposed framework achieves competitive performance under both intra- and inter-dataset testing protocols.

查看原文本刊更多论文

IrisFormer：虹膜识别专用变压器框架

尽管基于视觉变换（Vision Transformer, ViT）的方法显著提高了自然场景中各种视觉任务的性能，但虹膜识别的进展仍然有限。此外，人的虹膜包含着不同于自然场景的独特特征。为了解决这个问题，本文研究了一个专用的Transformer框架，称为IrisFormer，用于虹膜识别，并试图通过结合ViT的上下文建模能力和虹膜特定优化来提高准确性，以学习鲁棒性、细粒度和判别性特征。具体而言，为了实现虹膜识别中的旋转不变性，我们对每个虹膜图像标记采用相对位置编码而不是常规的绝对位置编码，并且在训练过程中使用水平像素移动策略进行数据增强。然后，为了增强模型对局部扭曲（如遮挡和反射）的鲁棒性，我们在训练过程中随机屏蔽一些标记，以迫使模型仅从部分图像中学习具有代表性的身份特征。最后，考虑到细粒度特征在虹膜识别中更具区别性，我们保留了整个标记序列进行贴片特征匹配，而不是使用标准的单个分类标记。在三个流行的数据集上的实验表明，该框架在数据集内部和数据集之间的测试协议下都具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.