DeepText:使用多asp组装的DeepLab从野外检测文本

Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu
{"title":"DeepText:使用多asp组装的DeepLab从野外检测文本","authors":"Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu","doi":"10.1109/ICDAR.2019.00042","DOIUrl":null,"url":null,"abstract":"In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab\",\"authors\":\"Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu\",\"doi\":\"10.1109/ICDAR.2019.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在本文中,我们以直接回归的方式解决了场景文本检测问题,并成功地为该应用采用了有效的语义分割模型DeepLab v3+[1]。为了处理任意方向和大小的文本,提高小文本的召回率,我们提出在不同分辨率的特征图之后,在DeepLab中插入多个空间金字塔池(ASPP)层来提取多尺度的特征。然后,我们在解码阶段设置多个辅助IoU损耗,并在中间编码层与解码器之间进行辅助连接,以辅助网络训练,增强较低编码层的识别能力。在基准场景文本数据集ICDAR2015上进行的实验表明,我们提出的网络(称为DeepText)优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab
In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信