{"title":"Recognition of Handwritten Mathematical Characters on Whiteboards Using Colour Images","authors":"Behrang Sabeghi Saroui, V. Sorge","doi":"10.1109/DAS.2014.66","DOIUrl":"https://doi.org/10.1109/DAS.2014.66","url":null,"abstract":"Automatic handwriting recognition has enjoyed significant improvements in the past decades. In particular, online recognition of mathematical formulas has seen a number of important advancements both for pen input devices as well as for smart boards. However, in reality most mathematics is still taught and developed on regular whiteboards and that the offline recognition still remains a challenging task. In this paper we are therefore concerned with the offline recognition of handwritten notes on whiteboards, presenting a novel way of transforming offline data via image analysis into equivalent online data. We use trajectory recovery techniques and statistical classification on high quality colour images to extract information on the strokes composing a character, such as start or end points and stroke direction. This data is then appropriately prepared and passed to an online character recogniser specialising on mathematical characters for the actual recognition task. We demonstrate the effectiveness of our new technique with experiments on a collection of 1000 whiteboard images of different mathematical symbols, Latin and Greek characters that have been obtained from a variety of writers using different types of pens.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two Level Algorithm for Text Detection in Natural Scene Images","authors":"Li Rong, Suyu Wang, Zhixin Shi","doi":"10.1109/DAS.2014.41","DOIUrl":"https://doi.org/10.1109/DAS.2014.41","url":null,"abstract":"In this paper we present a two-level method to detect text in natural scene images. In the first level, connected components (referred as CCs) are got from the images. Then candidate text lines are extracted and groups of connected components that align in horizontal or vertical direction are got. We think CCs in these groups have high probability are texts. To validate which CC is text, a SVM is trained to make an initial decision. The output of SVM is calibrated to posterior probability. Then we use the information of posterior probability of SVM and information of whether the connected component is in a group to divide the connected components into four classes: texts, non-texts, probable texts and undetermined CCs. In the second level, a conditional random field model is used to make final decision. Relationship between CCs is modeled by a network G(V, E), Vertices of the graph correspond to CCs. The determination in the first level will influence the second levels determination by giving different parameters of data term for the four classes of CCs. By this way, we not only use information of a single CCs feature, but also use the information of whether a CC is in a group to make final decision of whether the CC is text or non-text. Experiments show that the method is effective.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130424321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Separation of Graphics (Superimposed) and Scene Text in Video Frames","authors":"P. Shivakumara, N. V. Kumar, D. S. Guru, C. Tan","doi":"10.1109/DAS.2014.20","DOIUrl":"https://doi.org/10.1109/DAS.2014.20","url":null,"abstract":"The presence of both graphics and scene text in video frames makes text detection and recognition problem more challenging because the nature of the two texts differs significantly. This paper aims to propose a novel method for separation of graphics and scene text to achieve good recognition rate based on the fact that Canny and Sobel edge pattern share common property for text. We propose to use Ring Radius Transform to identify the radius that represents the medial axis in the edge image. We study the intra relationship between bins of the histograms over respective radius values, resulting in intra line graphs. In this way, the method finds intra line graphs for both Canny and Sobel edge images of the input text lines. To identify the unique distribution for separation of graphics and scene texts, we explore the inter relationship between intra line graphs of Canny and Sobel edge image with respective medial axes values. This results in Gaussian distribution for graphics and non-Gaussian for scene text. Experimental results on horizontal, non-horizontal, different scripts etc. show that the proposed method is effective for classification and the results of baseline recognition methods show that recognition rate is significantly improved after classification.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133891397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Brunessaux, Patrick Giroux, B. Grilhères, M. Manta, Maylis Bodin, K. Choukri, Olivier Galibert, Juliette Kahn
{"title":"The Maurdor Project: Improving Automatic Processing of Digital Documents","authors":"S. Brunessaux, Patrick Giroux, B. Grilhères, M. Manta, Maylis Bodin, K. Choukri, Olivier Galibert, Juliette Kahn","doi":"10.1109/DAS.2014.58","DOIUrl":"https://doi.org/10.1109/DAS.2014.58","url":null,"abstract":"This paper presents the achievements of an experimental project called Maurdor (Moyens AUtomatisés deReconnaissance de Documents ecRits - Automatic Processingof Digital Documents) funded by the French DGA that aims at improving processing technologies for handwritten and typewritten documents in French, English and Arabic. The first part describes the context and objectives of the project. The second part deals with the challenge of creating a realistic corpus of 10,000 annotated documents to support the efficient development and evaluation of processing modules. The third part presents the organisation, metric definition and results of the Maurdor International evaluation campaign. The last part presents the Maurdor demonstrator with a functional and technical perspective.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128789675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anguelos Nicolaou, Fouad Slimane, V. Märgner, M. Liwicki
{"title":"Local Binary Patterns for Arabic Optical Font Recognition","authors":"Anguelos Nicolaou, Fouad Slimane, V. Märgner, M. Liwicki","doi":"10.1109/DAS.2014.71","DOIUrl":"https://doi.org/10.1109/DAS.2014.71","url":null,"abstract":"Optical Font Recognition (OFR) has been proven to increase Optical Character Recognition (OCR) accuracy, but it can also help in harvesting semantic information from documents. It therefore becomes a part of many Document Image Analysis (DIA) pipelines. Our work is based on the hypothesis that Local Binary Patterns (LBP), as a generic texture classification method, can address several distinct DIA problems at the same time such as OFR, script detection, writer identification, etc. In this paper we strip down the Redundant Oriented LBP (RO-LBP) method, previously used in writer identification, and apply it for OFR with the goal of introducing a generic method that classifies text as oriented texture. We focus on Arabic OFR and try to perform a thorough comparison of our method and the leading Gaussian Mixture Model method that is developed specifically for the task. Depending on the nature of proposed OFR method, each method's performance is usually evaluated on different data and with different evaluation protocols. The proposed experimental procedure addresses this problem and allows us to compare OFR methods that are fundamentally different by adapting them to a common measurement protocol. In performed experiments LBP method achieves perfect results on large text blocks generated from the APTI database, while preserving its very broad generic attributes as proven by secondary experiments.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129767521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Liu, Xiaoqing Lu, Keqiang Li, J. Qu, Liangcai Gao, Zhi Tang
{"title":"Plane Geometry Figure Retrieval with Bag of Shapes","authors":"Lu Liu, Xiaoqing Lu, Keqiang Li, J. Qu, Liangcai Gao, Zhi Tang","doi":"10.1109/DAS.2014.53","DOIUrl":"https://doi.org/10.1109/DAS.2014.53","url":null,"abstract":"Digital education is serving an increasingly important function in most educational institutions, thus resulting in the production of a large number of digital documents online for education purposes. However, convenient ways to retrieve mathematic geometry questions are lacking because current retrieval systems largely rely on keywords instead of geometry figure images. This study focuses on plane geometry figure (PGF) image retrieval with the aim of retrieving relevant geometry images that contain more structural information than a question text stem. To fully use geometrical properties, a Bag-of-shapes (BoS) method is proposed to build the feature descriptor of an image. The BoS method contains either basic geometric primitives or dual-primitive structures along with several specific geometrical features for shape description. Based on the BoS feature descriptor, we apply cosine similarity with group feature weight as vector similarity measure for ranking to achieve high efficiency. For a PGF image query, the retrieval results are provided in an appropriate ranking order, which has high visual similarity with respect to human perception. Retrieval experiments and evaluation results show the effectiveness and efficiency of the proposed BoS shape descriptor.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125364099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Training Set Generation for Better Historic Document Transcription and Compression","authors":"G. Silva, R. Lins, Cesar Gomes","doi":"10.1109/DAS.2014.30","DOIUrl":"https://doi.org/10.1109/DAS.2014.30","url":null,"abstract":"The more complete the training set of an optical character recognition platform, the greater the chances of obtaining a better precision in transcription. The development of a database for such purpose is a task of paramount effort as it is performed manually and must be as extensive as possible in order to potentially cover all words in a language. Dealing with historic documents either handwritten, typed, or printed is even a harder effort as documents are often degraded by time and storage conditions. The recent work of Silva-Lins showed how to automatically generate training sets of isolated characters for cursive writing of one specific person. This is particularly important in the transcription of historic files of important people. The present work improves that strategy by analyzing letter ligature patterns. The improvement in OCR transcription accuracy both of printed, typed and handwritten documents is borne out by experimental evidence.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134373910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feasibility Study of Visualizing Diversity of Japanese Hiragana Handwritings by Multidimensional Scaling of Earth Mover's Distance toward Assisting Forensic Experts in Writer Verification","authors":"Yoshinori Akao, Atsushi Yamamoto, Yoshiyasu Higashikawa","doi":"10.1109/DAS.2014.13","DOIUrl":"https://doi.org/10.1109/DAS.2014.13","url":null,"abstract":"In this paper, we demonstrated a mapping procedure to visualize the diversity of overall handwriting shapes of five Japanese Hiragana characters for the purpose of assisting forensic examiners in the process of writer verification. Multidimensional scaling was applied to Earth Mover's Distance (EMD) data calculated between 60 different writers in order to visualize each writer's feature in population. EMD flow was calculated between k-means cluster centroids, which are representative points of kernel density distribution of handwritten stroke of each writer within six trials. Experimental results showed that the relative relation of overall handwritten shapes of each writer was successfully visualized as the locus in multidimensional space. The state of distribution such as the density in multidimensional space is considered to provide effective information to forensic examiners in evaluating the rarity of handwritten features observed in questioned document.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133382931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hamdani, P. Doetsch, M. Kozielski, A. Mousa, H. Ney
{"title":"The RWTH Large Vocabulary Arabic Handwriting Recognition System","authors":"M. Hamdani, P. Doetsch, M. Kozielski, A. Mousa, H. Ney","doi":"10.1109/DAS.2014.61","DOIUrl":"https://doi.org/10.1109/DAS.2014.61","url":null,"abstract":"This paper describes the RWTH system for large vocabulary Arabic handwriting recognition. The recognizer is based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding. The feature extraction is based on Recurrent Neural Networks (RNNs) which estimate the posterior distribution over the character labels for each observation. Discriminative training using the Minimum Phone Error (MPE) criterion is used to train the HMMs. The recognition is done with the help of n-gram Language Models (LMs) trained using in-domain text data. Unsupervised writer adaptation is also performed using the Constrained Maximum Likelihood Linear Regression (CMLLR) feature adaptation. The RWTH Arabic handwriting recognition system gave competitive results in previous handwriting recognition competitions. The used techniques allows to improve the performance of the system participating in the OpenHaRT 2013 evaluation.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131456619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Opitz, Markus Diem, Stefan Fiel, Florian Kleber, Robert Sablatnig
{"title":"End-to-End Text Recognition Using Local Ternary Patterns, MSER and Deep Convolutional Nets","authors":"M. Opitz, Markus Diem, Stefan Fiel, Florian Kleber, Robert Sablatnig","doi":"10.1109/DAS.2014.29","DOIUrl":"https://doi.org/10.1109/DAS.2014.29","url":null,"abstract":"Text recognition in natural scene images is an application for several computer vision applications like licence plate recognition, automated translation of street signs, help for visually impaired people or image retrieval. In this work an end-to-end text recognition system is presented. For detection an AdaBoost ensemble with a modified Local Ternary Pattern (LTP) feature-set with a post-processing stage build upon Maximally Stable Extremely Region (MSER) is used. The text recognition is done using a deep Convolution Neural Network (CNN) trained with backpropagation. The system presented outperforms state of the art methods on the ICDAR 2003 dataset in the text-detection (F-Score: 74.2%), dictionary-driven cropped-word recognition (F-Score: 87.1%) and dictionary-driven end-to-end recognition (F-Score: 72.6%) tasks.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125047012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}