基于深度条形网络的级联学习场景文本定位

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI:10.1109/ICDAR.2017.140

Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao

{"title":"基于深度条形网络的级联学习场景文本定位","authors":"Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao","doi":"10.1109/ICDAR.2017.140","DOIUrl":null,"url":null,"abstract":"Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Deep Strip-Based Network with Cascade Learning for Scene Text Localization\",\"authors\":\"Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao\",\"doi\":\"10.1109/ICDAR.2017.140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

场景文本检测是当前计算机视觉界的一个热门研究课题。然而，由于文本的变化和杂乱的背景，这是一项具有挑战性的任务。本文提出了一种新的场景文本定位框架。在区域提议网络的基础上，利用垂直锚点机制，建立了基于条形文本检测网络(STDN)来预测文本/非文本条形提议。同时，我们在所提出的网络中加入了递归神经网络层，以改进预测结果。具体来说，利用级联学习进行硬例挖掘来训练STDN，在精度上有了显著的提高。此外，我们还利用聚类算法自发地生成锚维，而不需要人工挑选，这是便携和节省时间的。该文本检测框架在ICDAR2013上达到了最先进的性能，f值为0.89。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Strip-Based Network with Cascade Learning for Scene Text Localization

Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量