{"title":"Transformer-based end-to-end scene text recognition","authors":"Xinghao Zhu, Zhi Zhang","doi":"10.1109/ICIEA51954.2021.9516154","DOIUrl":null,"url":null,"abstract":"In recent years, regular scene text recognition has made great progress, but irregular text recognition still has certain difficulties. Most current text recognition methods treat text detection and text recognition as two separate tasks. In order to better recognize irregular text, this paper proposes an end-to-end scene text recognition based on a Transformer model, which not only uses the attention mechanism to perform Decode, but also introduce a network for correcting pictures and a network structure that expands its model through a bidirectional decoder. In order to better evaluate the performance of this model, experiments are carried out on data sets such as SVT and ICDAR 2013. The experiments prove that the method in this paper relatively balances complexity and accuracy, and has obvious performance advantages.","PeriodicalId":6809,"journal":{"name":"2021 IEEE 16th Conference on Industrial Electronics and Applications (ICIEA)","volume":"18 1","pages":"1691-1695"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 16th Conference on Industrial Electronics and Applications (ICIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEA51954.2021.9516154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, regular scene text recognition has made great progress, but irregular text recognition still has certain difficulties. Most current text recognition methods treat text detection and text recognition as two separate tasks. In order to better recognize irregular text, this paper proposes an end-to-end scene text recognition based on a Transformer model, which not only uses the attention mechanism to perform Decode, but also introduce a network for correcting pictures and a network structure that expands its model through a bidirectional decoder. In order to better evaluate the performance of this model, experiments are carried out on data sets such as SVT and ICDAR 2013. The experiments prove that the method in this paper relatively balances complexity and accuracy, and has obvious performance advantages.