Scene Text Recognition with Multi-decoders

2021 21st International Conference on Control, Automation and Systems (ICCAS) Pub Date : 2021-10-12 DOI:10.23919/ICCAS52745.2021.9649998

Yao Wang, J. Ha

{"title":"Scene Text Recognition with Multi-decoders","authors":"Yao Wang, J. Ha","doi":"10.23919/ICCAS52745.2021.9649998","DOIUrl":null,"url":null,"abstract":"In this article, we focus on the scene text recognition problem, which is one of the challenging sub-files of computer vision because of the random existence of scene text. Recently, scene text recognition has achieved state-of-art performance because of the improvement of deep learning. At present, encoder-decoder architecture was widely used for scene recognition tasks, which consist of feature extractor, sequence module. Specifically, at the decoder part, connectionist temporal classification(CTC), attention mechanism, and transformer(self-attention) are three main approaches used in recent research. CTC decoder is flexible and can handle sequences with large changes in length for its align sequences features with labels in a frame-wise manner. Attention decoder can learn better and deeper feature expression and get the better position information of each character. Attention decoder can get more robust and accurate performance for both regular and irregular scene text. Moreover, a novel decoder mechanism is introduced in our study. The proposed architecture has several advantages: the model can be trained using the end-to-end manner under the condition of multi decoders, and can deal with the sequences of arbitrary length and the images of arbitrary shape. Extensive experiments on standard benchmarks demonstrate that our model's performance is improved for regular and irregular text recognition.","PeriodicalId":411064,"journal":{"name":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS52745.2021.9649998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this article, we focus on the scene text recognition problem, which is one of the challenging sub-files of computer vision because of the random existence of scene text. Recently, scene text recognition has achieved state-of-art performance because of the improvement of deep learning. At present, encoder-decoder architecture was widely used for scene recognition tasks, which consist of feature extractor, sequence module. Specifically, at the decoder part, connectionist temporal classification(CTC), attention mechanism, and transformer(self-attention) are three main approaches used in recent research. CTC decoder is flexible and can handle sequences with large changes in length for its align sequences features with labels in a frame-wise manner. Attention decoder can learn better and deeper feature expression and get the better position information of each character. Attention decoder can get more robust and accurate performance for both regular and irregular scene text. Moreover, a novel decoder mechanism is introduced in our study. The proposed architecture has several advantages: the model can be trained using the end-to-end manner under the condition of multi decoders, and can deal with the sequences of arbitrary length and the images of arbitrary shape. Extensive experiments on standard benchmarks demonstrate that our model's performance is improved for regular and irregular text recognition.

查看原文本刊更多论文

场景文本识别与多解码器

本文主要研究场景文本识别问题，由于场景文本的随机存在，场景文本识别问题是计算机视觉领域中具有挑战性的子问题之一。近年来，由于深度学习的改进，场景文本识别已经达到了最先进的性能。目前被广泛应用于场景识别任务的是编码器-解码器架构，该架构由特征提取器、序列模块组成。具体而言，在解码器部分，连接主义时间分类(CTC)、注意机制(attention mechanism)和自注意(transformer, self-attention)是近年来研究的三种主要方法。CTC解码器是灵活的，可以处理序列的长度有很大的变化，因为它的对齐序列特征与标签在帧明智的方式。注意解码器可以更好、更深入地学习特征表达，得到每个字符更好的位置信息。对于规则和不规则的场景文本，注意力解码器都能获得更好的鲁棒性和准确性。此外，我们的研究还引入了一种新的解码器机制。该结构具有多个解码器条件下的端到端训练模型、处理任意长度序列和任意形状图像等优点。在标准基准测试上的大量实验表明，我们的模型在规则和不规则文本识别方面的性能得到了提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 21st International Conference on Control, Automation and Systems (ICCAS)

自引率

0.00%

发文量