{"title":"A deep learning based scene text detector combining two strategies","authors":"Ting Jin, Zhaogong Zhang, Zhichao Zhang","doi":"10.1145/3579654.3579676","DOIUrl":null,"url":null,"abstract":"Detecting scene text has been a challenging task due to the complex geometric layouts of texts. We can broadly classify the state-of-the-art scene text detection methods into two categories. The first category is the top-down methods, which view text as a whole and locate text by regression learning on the points of text bounding boxes or by learning the geometric properties of text, but most algorithms have difficulty in separating neighboring text. The second category is the bottom-up methods, which treat the text as composed of simple local components and obtain text instances by post-processings, but most algorithms rely on accurate segmentation results. In this paper, we propose a method that combines these two types of ideas while avoiding their drawbacks. Specifically, we use a top-down strategy to obtain text contours, and then use a contour scoring module to score the text contours to obtain more accurate results. In addition, we use a bottom-up strategy to obtain kernels and similarity vectors. Subsequently, pixel aggregation is used to combine the results of the two parts to obtain a more flexible representation of the text instances. Experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":146783,"journal":{"name":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579654.3579676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Detecting scene text has been a challenging task due to the complex geometric layouts of texts. We can broadly classify the state-of-the-art scene text detection methods into two categories. The first category is the top-down methods, which view text as a whole and locate text by regression learning on the points of text bounding boxes or by learning the geometric properties of text, but most algorithms have difficulty in separating neighboring text. The second category is the bottom-up methods, which treat the text as composed of simple local components and obtain text instances by post-processings, but most algorithms rely on accurate segmentation results. In this paper, we propose a method that combines these two types of ideas while avoiding their drawbacks. Specifically, we use a top-down strategy to obtain text contours, and then use a contour scoring module to score the text contours to obtain more accurate results. In addition, we use a bottom-up strategy to obtain kernels and similarity vectors. Subsequently, pixel aggregation is used to combine the results of the two parts to obtain a more flexible representation of the text instances. Experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.