Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao
{"title":"基于深度条形网络的级联学习场景文本定位","authors":"Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao","doi":"10.1109/ICDAR.2017.140","DOIUrl":null,"url":null,"abstract":"Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Deep Strip-Based Network with Cascade Learning for Scene Text Localization\",\"authors\":\"Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao\",\"doi\":\"10.1109/ICDAR.2017.140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Strip-Based Network with Cascade Learning for Scene Text Localization
Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.