{"title":"Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images","authors":"Li Rong, En MengYi, Liang Jianqiang, Zhang Haibin","doi":"10.1109/ICDAR.2017.61","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.61","url":null,"abstract":"Detection and recognition of textual information in scene images is useful but challenging tasks. Numerous methods have been proposed to solve the problem. Recently the best results are attained by deep neural network based methods. Training such networks needs large amounts of bounding box-level or pixel-level annotated data. Generating large amounts of such data always requires huge amounts of labor which can be expensive and time consuming. In this paper we explore the utilization of weakly supervised deep neural network for generating text proposals in natural scene images. The network allows multi-scale inputs and is trained to perform whole image binary classification to tell whether an image contains text or not. After training the network acquired learning of powerful discriminated features that are capable of distinguishing text from other objects. To get the text location, text confidence score map is generated based on feature maps from the top two convolutional layers by extracting class activation map. Value of each pixel in the score map denotes the confidence score of whether the pixel belongs to text or not. By setting a threshold the score map is converted to a binary mask map. Foregrounds of the mask map are probable text areas. Then Maximally Stable Extremal Regions (MSERs) are extracted from these probable text areas and are aggregated as groups. By processing these groups, text proposals are obtained. Experimental results show that without using any bounding boxes or pixel-level annotation, the algorithm achieves recall rate comparable to some fully supervised methods in ICDAR 2013 focused text dataset and In ICDAR 2015 incidental text dataset.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125557140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao
{"title":"Deep Strip-Based Network with Cascade Learning for Scene Text Localization","authors":"Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao","doi":"10.1109/ICDAR.2017.140","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.140","url":null,"abstract":"Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126925957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Symmetry-Based Method for Scene/Video Text Detection through Neural Network","authors":"Yirui Wu, Wenhai Wang, P. Shivakumara, Tong Lu","doi":"10.1109/ICDAR.2017.206","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.206","url":null,"abstract":"Text detection in video/scene images has gained a significant attention in the field of image processing and document analysis due to the inherent challenges caused by variations in contrast, orientation, background, text type, font type, non-uniform illumination and so on. In this paper, we propose a novel text detection method to explore symmetry property and appearance features of text for improved accuracy and robustness. First, the proposed method explores Extremal Regions (ER) for detecting text candidates in images. Then we propose a novel feature named as Multi-domain Strokes Symmetry Histogram (MSSH) for each text candidate, which describes the inherent symmetry property of stroke pixel pairs in gray, gradient and frequency domains. Furthermore, deep convolutional features are extracted to describe the appearance for each text candidate. We further fuse them by Auto-Encoder network to define a more discriminative text descriptor for classification. Finally, the proposed method constructs text lines based on the classification results. We demonstrate the effectiveness and robustness detection results of our proposed method by testing on four different benchmark databases.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114955146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scene Text Relocation with Guidance","authors":"Anna Zhu, S. Uchida","doi":"10.1109/ICDAR.2017.212","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.212","url":null,"abstract":"Applying object proposal technique for scene text detection becomes popular for its significant improvement in speed and accuracy for object detection. However, some of the text regions after the proposal classification are overlapped and hard to remove or merge. In this paper, we present a scene text relocation system that refines the detection from text proposals to text. An object proposal-based deep neural network is employed to get the text proposals. To tackle the detection overlapping problem, a refinement deep neural network relocates the overlapped regions by estimating the text probability inside, and locating the accurate text regions by thresholding. Since the spacebetweenwordsindifferenttextlinesarevarious, aguidance mechanism is proposed in text relocation to guide where to extract the text regions in word level. This refinement procedure helps boost the precision after removing multiple overlapped text regions or joint cracked text regions. The experimental results on standard benchmark ICDAR 2013 demonstrate the effectiveness of the proposed approach.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115553460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition","authors":"Théodore Bluche, Ronaldo O. Messina","doi":"10.1109/ICDAR.2017.111","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.111","url":null,"abstract":"In this paper, we propose a new neural network architecture for state-of-the-art handwriting recognition, alternative to multi-dimensional long short-term memory (MD-LSTM) recurrent neural networks. The model is based on a convolutional encoder of the input images, and a bidirectional LSTM decoder predicting character sequences. In this paradigm, we aim at producing generic, multilingual and reusable features with the convolutional encoder, leveraging more data for transfer learning. The architecture is also motivated by the need for a fast training on GPUs, and the requirement of a fast decoding on CPUs. The main contribution of this paper lies in the convolutional gates in the encoder, enabling hierarchical context-sensitive feature extraction. The experiments on a large benchmark including seven languages show a consistent and significant improvement of the proposed approach over our previous production systems. We also report state-of-the-art results on line and paragraph level recognition on the IAM and Rimes databases.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122374277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Histogram of Exclamation Marks and Its Application for Comics Analysis","authors":"Sotaro Hiroe, S. Hotta","doi":"10.1109/ICDAR.2017.294","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.294","url":null,"abstract":"This paper proposes a histogram formed by counting the number of exclamation marks in comic books for comics analysis. Exclamation marks in comic books are used for expressing character's emotion and are depicted in frequently excited scenes. Also they are easy to detect from different comic books written by various authors. Hence we represent each comic book by its distribution of exclamation marks as a histogram and use such histogram for topic change detection and visualization of relationships between comic books. Experimental results on real comic books show that our bold approach has potential for approximating of contents of comic books by using the proposed histograms.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122699497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Machine Learning System for Assisting Neophyte Researchers in Digital Libraries","authors":"Bissan Audeh, M. Beigbeder, C. Largeron","doi":"10.1109/ICDAR.2017.60","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.60","url":null,"abstract":"Although existing digital libraries such as Google Scholar and CiteSeerX propose advanced search functionalities, they do not take into consideration whether the user is new or specialized in the research domain of his query. As a result, neophytes can spend a lot of time checking documents that are not adapted to their initial information need. In this paper, we propose NeoTex, a machine learning based approach that combines content-based retrieval and citation graph measures to propose documents adapted to new researchers. The contributions of our work are: designing a model for scientific retrieval suited to neophytes, defining an evaluation protocol with realistic ground truths, and testing the model on a large real collection from a national digital library.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129496836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comic Characters Detection Using Deep Learning","authors":"Nhu-Van Nguyen, Christophe Rigaud, J. Burie","doi":"10.1109/ICDAR.2017.290","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.290","url":null,"abstract":"Comic characters detection has been an interesting area in comic analysis as it not only allows more efficient indexation and retrieval for comic books but also yields an adequate understanding of comics so as to help better creating the digital form of comic books. In recent years, several methods that have been proposed to extract/detect characters from comics, have given reasonable performance. However, they always use their datasets to evaluate the methods without comparing with other works or experimenting on a standard dataset. In this work, we take advantage of the recent and significant development of deep learning to apply it to comic character detection. We use the latest object detection deep networks to train the comic characters detector based on our proposed dataset. By experimenting on our proposed dataset and also on available datasets from previous works, we have found that this method significantly outperforms existing methods. We believe that this state-of-the-art approach can be considered as a reliable baseline method to compare and better understand future detection techniques.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Supervised Transfer Learning for Convolutional Neural Network Based Chinese Character Recognition","authors":"Yejun Tang, Bing Wu, Liangrui Peng, Changsong Liu","doi":"10.1109/ICDAR.2017.79","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.79","url":null,"abstract":"Although transfer learning has aroused researchers' great interest, how to utilize the unlabeled data is still an open and important problem in this area. We propose a novel semi-supervised transfer learning (STL) method by incorporating Multi-Kernel Maximum Mean Discrepancy (MK-MMD) loss into the traditional fine-tuned Convolutional Neural Network (CNN) transfer learning framework for Chinese character recognition. The proposed method includes three steps. First, a CNN model is trained by massive labeled samples in the source domain. Then the CNN model is fine-tuned by a few labeled samples in the target domain. Finally, the CNN model is trained with both a large number of unlabeled samples and the limited labeled samples in the target domain to minimize the MK-MMD loss. Experiments investigate detailed configurations and parameters of the proposed STL method with several frequently used CNN structures including AlexNet, GoogLeNet, and ResNet. Experimental results on practical Chinese character transfer learning tasks, such as Dunhuang historical Chinese character recognition, indicate that the proposed method can significantly improve recognition accuracy in the target domain.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128428796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Radical-Based Chinese Character Recognition via Multi-Labeled Learning of Deep Residual Networks","authors":"Tie-Qiang Wang, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2017.100","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.100","url":null,"abstract":"The digitization of Chinese historical documents poses a new challenge that in the huge set of character categories, majority of characters are not in common use now and have few samples for training the character classifiers. To settle this problem, we consider the radical-level composition of Chinese characters, and propose to detect position-dependent radicals using a deep residual network with multi-labeled learning. This enables the recognition of novel characters without training samples if the characters are composed of radicals appearing in training samples. In multi-labeled learning, each training character sample is labeled as positive for each radical it contains, such that after training, all the radicals appearing in the character can be detected. Experimental results on a large-category-set database of printed Chinese characters demonstrate that the proposed method can detect radicals accurately. Moreover, according to radical configurations, our model can credibly recognize novel characters as well as trained characters.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128209157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}