Sclera-TransFuse: Fusing Vision Transformer and CNN for Accurate Sclera Segmentation and Recognition

IF 5

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-06-17 DOI:10.1109/TBIOM.2024.3415484

Caiyong Wang;Haiqing Li;Yixin Zhang;Guangzhe Zhao;Yunlong Wang;Zhenan Sun

{"title":"Sclera-TransFuse: Fusing Vision Transformer and CNN for Accurate Sclera Segmentation and Recognition","authors":"Caiyong Wang;Haiqing Li;Yixin Zhang;Guangzhe Zhao;Yunlong Wang;Zhenan Sun","doi":"10.1109/TBIOM.2024.3415484","DOIUrl":null,"url":null,"abstract":"This paper investigates a deep learning based unified framework for accurate sclera segmentation and recognition, named Sclera-TransFuse. Unlike previous CNN-based methods, our framework incorporates Vision Transformer and CNN to extract complementary feature representations, which are beneficial to both subtasks. Specifically, for sclera segmentation, a novel two-stream hybrid model, referred to as Sclera-TransFuse-Seg, is developed to integrate classical ResNet-34 and recently emerging Swin Transformer encoders in parallel. The dual-encoders firstly extract coarse- and fine-grained feature representations at hierarchical stages, separately. Then a Cross-Domain Fusion (CDF) module based on information interaction and self-attention mechanism is introduced to efficiently fuse the multi-scale features extracted from dual-encoders. Finally, the fused features are progressively upsampled and aggregated to predict the sclera masks in the decoder meanwhile deep supervision strategies are employed to learn intermediate feature representations better and faster. With the results of sclera segmentation, the sclera ROI image is generated for sclera feature extraction. Additionally, a new sclera recognition model, termed as Sclera-TransFuse-Rec, is proposed by combining lightweight EfficientNet B0 and multi-scale Vision Transformer in sequential to encode local and global sclera vasculature feature representations. Extensive experiments on several publicly available databases suggest that our framework consistently achieves state-of-the-art performance on various sclera segmentation and recognition benchmarks, including the 8th Sclera Segmentation and Recognition Benchmarking Competition (SSRBC 2023). A UBIRIS.v2 subset of 683 eye images with manually labeled sclera masks, and our codes are publicly available to the community through \n<uri>https://github.com/lhqqq/Sclera-TransFuse</uri>\n.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"575-590"},"PeriodicalIF":5.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10559402/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper investigates a deep learning based unified framework for accurate sclera segmentation and recognition, named Sclera-TransFuse. Unlike previous CNN-based methods, our framework incorporates Vision Transformer and CNN to extract complementary feature representations, which are beneficial to both subtasks. Specifically, for sclera segmentation, a novel two-stream hybrid model, referred to as Sclera-TransFuse-Seg, is developed to integrate classical ResNet-34 and recently emerging Swin Transformer encoders in parallel. The dual-encoders firstly extract coarse- and fine-grained feature representations at hierarchical stages, separately. Then a Cross-Domain Fusion (CDF) module based on information interaction and self-attention mechanism is introduced to efficiently fuse the multi-scale features extracted from dual-encoders. Finally, the fused features are progressively upsampled and aggregated to predict the sclera masks in the decoder meanwhile deep supervision strategies are employed to learn intermediate feature representations better and faster. With the results of sclera segmentation, the sclera ROI image is generated for sclera feature extraction. Additionally, a new sclera recognition model, termed as Sclera-TransFuse-Rec, is proposed by combining lightweight EfficientNet B0 and multi-scale Vision Transformer in sequential to encode local and global sclera vasculature feature representations. Extensive experiments on several publicly available databases suggest that our framework consistently achieves state-of-the-art performance on various sclera segmentation and recognition benchmarks, including the 8th Sclera Segmentation and Recognition Benchmarking Competition (SSRBC 2023). A UBIRIS.v2 subset of 683 eye images with manually labeled sclera masks, and our codes are publicly available to the community through https://github.com/lhqqq/Sclera-TransFuse .

查看原文本刊更多论文

Sclera-TransFuse：融合视觉变换器和 CNN，实现准确的巩膜分割和识别

本文研究了一种基于深度学习的统一框架，用于准确的巩膜分割和识别，命名为 Sclera-TransFuse。与以往基于 CNN 的方法不同，我们的框架结合了视觉变换器和 CNN 来提取互补的特征表征，这对两个子任务都有好处。具体来说，在巩膜分割方面，我们开发了一种新颖的双流混合模型（称为 Sclera-TransFuse-Seg），并行集成了经典的 ResNet-34 和最近出现的 Swin Transformer 编码器。双编码器首先在分层阶段分别提取粗粒度和细粒度特征表征。然后，引入基于信息交互和自我关注机制的跨域融合（CDF）模块，对从双编码器中提取的多尺度特征进行高效融合。最后，在解码器中对融合后的特征进行逐步上采样和聚合，以预测巩膜掩膜，同时采用深度监督策略，更好更快地学习中间特征表征。根据巩膜分割的结果，生成巩膜 ROI 图像，用于巩膜特征提取。此外，通过结合轻量级 EfficientNet B0 和多尺度视觉转换器，提出了一种新的巩膜识别模型，称为 Sclera-TransFuse-Rec，该模型可连续编码局部和全局巩膜血管特征表征。在多个公开数据库上进行的广泛实验表明，我们的框架在各种巩膜分割和识别基准测试（包括第八届巩膜分割和识别基准测试竞赛（SSRBC 2023））中始终保持着最先进的性能。UBIRIS.v2子集包含683幅带有人工标注巩膜掩膜的眼睛图像，我们的代码通过https://github.com/lhqqq/Sclera-TransFuse 向社会公开。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on biometrics, behavior, and identity science

CiteScore

10.90

自引率

0.00%

发文量