DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00042

Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu

引用次数: 1

Abstract

In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.

查看原文本刊更多论文

DeepText:使用多asp组装的DeepLab从野外检测文本

在本文中，我们以直接回归的方式解决了场景文本检测问题，并成功地为该应用采用了有效的语义分割模型DeepLab v3+[1]。为了处理任意方向和大小的文本，提高小文本的召回率，我们提出在不同分辨率的特征图之后，在DeepLab中插入多个空间金字塔池(ASPP)层来提取多尺度的特征。然后，我们在解码阶段设置多个辅助IoU损耗，并在中间编码层与解码器之间进行辅助连接，以辅助网络训练，增强较低编码层的识别能力。在基准场景文本数据集ICDAR2015上进行的实验表明，我们提出的网络(称为DeepText)优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量