{"title":"HTCSigNet: A Hybrid Transformer and Convolution Signature Network for offline signature verification","authors":"Lidong Zheng , Da Wu , Shengjie Xu, Yuchen Zheng","doi":"10.1016/j.patcog.2024.111146","DOIUrl":null,"url":null,"abstract":"<div><div>For Offline Handwritten Signature Verification (OHSV) tasks, traditional Convolutional Neural Networks (CNNs) and transformers are hard to individually capture global and local features from signatures, and single-depth models often suffer from overfitting and poor generalization problems. To overcome those difficulties, in this paper, a novel Hybrid Transformer and Convolution Signature Network (HTCSigNet) is proposed to capture multi-scale features from signatures. Specifically, the HTCSigNet is an innovative framework that consists of two parts: transformer and CNN-based blocks which are used to respectively extract global and local features from signatures. The CNN-based block comprises a Space-to-depth Convolution (SPD-Conv) module which improves the feature learning capability by precisely focusing on signature strokes, a Spatial and Channel Reconstruction Convolution (SCConv) module which enhances model generalization by focusing on more distinctive micro-deformation features while reducing attention to common features, and convolution module that extracts the shape, morphology of specific strokes, and other local features from signatures. In the transformer-based block, there is a Vision Transformer (ViT) which is used to extract overall shape, layout, general direction, and other global features from signatures. After the feature learning stage, Writer-Dependent (WD) and Writer-Independent (WI) verification systems are constructed to evaluate the performance of the proposed HTCSigNet. Extensive experiments on four public signature datasets, GPDSsynthetic, CEDAR, UTSig, and BHSig260 (Bengali and Hindi) demonstrate that the proposed HTCSigNet learns discriminative representations between genuine and skilled forged signatures and achieves state-of-the-art or competitive performance compared with advanced verification systems. Furthermore, the proposed HTCSigNet is easy to transfer to different language datasets in OHSV tasks.<span><span><sup>2</sup></span></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111146"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008975","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
For Offline Handwritten Signature Verification (OHSV) tasks, traditional Convolutional Neural Networks (CNNs) and transformers are hard to individually capture global and local features from signatures, and single-depth models often suffer from overfitting and poor generalization problems. To overcome those difficulties, in this paper, a novel Hybrid Transformer and Convolution Signature Network (HTCSigNet) is proposed to capture multi-scale features from signatures. Specifically, the HTCSigNet is an innovative framework that consists of two parts: transformer and CNN-based blocks which are used to respectively extract global and local features from signatures. The CNN-based block comprises a Space-to-depth Convolution (SPD-Conv) module which improves the feature learning capability by precisely focusing on signature strokes, a Spatial and Channel Reconstruction Convolution (SCConv) module which enhances model generalization by focusing on more distinctive micro-deformation features while reducing attention to common features, and convolution module that extracts the shape, morphology of specific strokes, and other local features from signatures. In the transformer-based block, there is a Vision Transformer (ViT) which is used to extract overall shape, layout, general direction, and other global features from signatures. After the feature learning stage, Writer-Dependent (WD) and Writer-Independent (WI) verification systems are constructed to evaluate the performance of the proposed HTCSigNet. Extensive experiments on four public signature datasets, GPDSsynthetic, CEDAR, UTSig, and BHSig260 (Bengali and Hindi) demonstrate that the proposed HTCSigNet learns discriminative representations between genuine and skilled forged signatures and achieves state-of-the-art or competitive performance compared with advanced verification systems. Furthermore, the proposed HTCSigNet is easy to transfer to different language datasets in OHSV tasks.2
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.