Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu
{"title":"DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab","authors":"Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu","doi":"10.1109/ICDAR.2019.00042","DOIUrl":null,"url":null,"abstract":"In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.