{"title":"Improved Localization Accuracy by LocNet for Faster R-CNN Based Text Detection","authors":"Zhuoyao Zhong, Lei Sun, Qiang Huo","doi":"10.1109/ICDAR.2017.155","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.155","url":null,"abstract":"Although Faster R-CNN based approaches have achieved promising results for text detection, their localization accuracy is not satisfactory in certain cases. In this paper, we propose to use a LocNet to improve the localization accuracy of a Faster R-CNN based text detector. Given a proposal generated by region proposal network (RPN), instead of predicting directly the bounding box coordinates of the concerned text instance, the proposal is enlarged to create a search region so that conditional probabilities to each row and column of this search region can be assigned, which are then used to infer accurately the concerned bounding box. Experiments demonstrate that the proposed approach boosts the localization accuracy for Faster R-CNN based text detection significantly. Consequently, our new text detector has achieved superior performance on ICDAR-2011, ICDAR-2013 and MULTILIGUL text detection benchmark tasks.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Meier, Thilo Stadelmann, Jan Stampfli, M. Arnold, Mark Cieliebak
{"title":"Fully Convolutional Neural Networks for Newspaper Article Segmentation","authors":"B. Meier, Thilo Stadelmann, Jan Stampfli, M. Arnold, Mark Cieliebak","doi":"10.1109/ICDAR.2017.75","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.75","url":null,"abstract":"Segmenting newspaper pages into articles that semantically belong together is a necessary prerequisite for article-based information retrieval on print media collections like e.g. archives and libraries. It is challenging due to vastly differing layouts of papers, various content types and different languages, but commercially very relevant for e.g. media monitoring. We present a semantic segmentation approach based on the visual appearance of each page. We apply a fully convolutional neural network (FCN) that we train in an end-to-end fashion to transform the input image into a segmentation mask in one pass. We show experimentally that the FCN performs very well: it outperforms a deep learning-based commercial solution by a large margin in terms of segmentation quality while in addition being computationally two orders of magnitude more efficient.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123337344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab
{"title":"Temporal Integration for Word-Wise Caption and Scene Text Identification","authors":"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab","doi":"10.1109/ICDAR.2017.65","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.65","url":null,"abstract":"Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121314043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rank-Reducing Two-Dimensional Grammars for Document Layout Analysis","authors":"D. Prusa, Akio Fujiyoshi","doi":"10.1109/ICDAR.2017.185","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.185","url":null,"abstract":"We study the task of document layout analysis based on two-dimensional context-free grammars. We first identify a subclass of the grammars sufficient for a document structure description where productions follow a mechanism inducing regular languages in the case of one-dimensional productions. We then show that properties of such grammars can be conveniently utilized to implement a very fast top-down parser. Experimental results are reported for PDF documents, which are chosen as a test domain since we are motivated by a development of digital document access methods for people with disabilities in which a retrieval of structural information plays an important role.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"737 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122950261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"1990 US Census Form Recognition Using CTC Network, WFST Language Model, and Surname Correction","authors":"Huaigu Cao, Stephen Rawls, P. Natarajan","doi":"10.1109/ICDAR.2017.163","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.163","url":null,"abstract":"This paper presents a system for transcribing 1990 US census forms. Extraction of information from census forms is useful for creating a genealogy database and better archiving census forms. We trained CTC/LSTM-RNN networks as our OCR engine. We solved the major challenge in language modeling by defining syntactical constraints with WFST language models. We made two major technical contributions in this paper. Firstly, 1990 US census forms were automatically transcribed with compelling accuracy for the first time using our system, which can be useful in downstream study in information extracted from census forms. Secondly, we designed a novel post-processing algorithm that improved the recognition accuracy of surnames significantly.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"518 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123105916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Mohammed, V. Märgner, T. Konidaris, H. Siegfried Stiehl
{"title":"Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification","authors":"H. Mohammed, V. Märgner, T. Konidaris, H. Siegfried Stiehl","doi":"10.1109/ICDAR.2017.168","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.168","url":null,"abstract":"Writer identification and verification can be viewed as a classification problem, where each writer represents a class. We propose a classifier for offline, text-independent, and segmentation-free writer identification based on the Local Naïve Bayes Nearest-Neighbour (Local NBNN) classification. Our proposed method takes into consideration the particularity of handwriting patterns by adding a constraint to prevent the matching of irrelevant keypoints. Furthermore, a normalisation factor is proposed to cope with the prevalent problem of unbalanced data. The method has been evaluated on several public datasets of different writing systems and state-of-the-art results are shown to be improved.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114236062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dafang He, Scott D. Cohen, Brian L. Price, Daniel Kifer, C. Lee Giles
{"title":"Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection","authors":"Dafang He, Scott D. Cohen, Brian L. Price, Daniel Kifer, C. Lee Giles","doi":"10.1109/ICDAR.2017.50","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.50","url":null,"abstract":"Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level \"edges\" around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115965189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Font Setting Based Bayesian Model to Extract Mathematical Expression in PDF Files","authors":"Xing Wang, Jyh-Charn S. Liu","doi":"10.1109/ICDAR.2017.129","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.129","url":null,"abstract":"This paper proposes a Font Setting based Bayesian (FSB) model to extract mathematical expressions (MEs) in the portable document format (PDF) files. The FSB model is a self-adaptive unsupervised algorithm which first uses rules to identify ME and non-ME (NME) and then extracts the remaining ME using the Bayesian inference based on the observation that MEs tend to repeatedly represented in a particular style. PDF files are first processed using a PDF parser and document layout is analyzed using projection profiling cutting based algorithm to detect columns and lines. Heuristic rules derived from the knowledge of math usage and writing practices are employed to reason about the posterior probability of a char being ME vs. NME, conditional upon the font and value information. Based on the char level posterior probability, Bayesian inference is used to infer a none-separable character set (NSCS) being ME or not. Consecutive (fragmented) ME NSCS are merged to produce final results. Experimental results show that our approach achieves 0.006 (0.135) false rate and 0.111/0.093 miss rate for IME (EME) extraction. As for NSCS classification, our approach achieves 93.1% precision, 90.5% recall rate, and F1 score of 0.918. The processing time is markedly shorter than supervised machine learning techniques, and the extracted information and analytics products can be used for high level applications.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125624115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu
{"title":"Handwriting Style Mixture Adaptation","authors":"Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2017.166","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.166","url":null,"abstract":"In handwriting recognition, the test data usually come from multiple writers which are not shown in the training data. Therefore, adapting the base classifier towards the new style of each writer can significantly improve the generalization performance. Traditional writer adaptation methods usually assume that there is only one writer (one style) in the test data, and we call this situation as style-clear adaptation. However, a more common situation is that multiple handwriting styles exist in the test data, which is widely appeared in multi-font documents and handwriting data produced by the cooperation of multiple writers. We call the adaptation in this situation as style-mixture adaptation. To deal with this problem, in this paper, we propose a novel method called K-style mixture adaptation (K-SMA) with the assumption that there are totally K styles in the test data. Specifically, we first partition the test data into K groups (style clustering) according to their style consistency, which is measured by a newly designed style feature that can eliminate class (category) information and keep handwriting style information. After that, in each group, a style transfer mapping (STM) is used for writer adaptation. Since the initial style clustering may be not reliable, we repeat this process iteratively to improve the adaptation performance. The K-SMA model is fully unsupervised which do not require either the class label or the style index. Moreover, the K-SMA model can be effectively combined with the benchmark convolutional neural network (CNN) models. Experiments on the online Chinese handwriting database CASIA-OLHWDB demonstrate that K-SMA is an efficient and effective solution for style-mixture adaptation.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing Handwritten Ink Strokes with a Fast Video Camera","authors":"Chelhwon Kim, Patrick Chiu, H. Oda","doi":"10.1109/ICDAR.2017.209","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.209","url":null,"abstract":"We present a system for capturing ink strokes written with ordinary pen and paper using a fast camera with a frame rate comparable to a stylus digitizer. From the video frames, ink strokes are extracted and used as input to an online handwriting recognition engine. A key component in our system is a pen up/down detection model for detecting the contact of the pen-tip with the paper in the video frames. The proposed model consists of feature representation with convolutional neural networks and classification with a recurrent neural network. We also use a high speed tracker with kernelized correlation filters to track the pen-tip. For training and evaluation, we collected labeled video data of users writing English and Japanese phrases from public datasets, and we report on character accuracy scores for different frame rates in the two languages.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131331136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}