MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505380
Tarun Jindal, U. Bhattacharya
{"title":"Recognition of offline handwritten numerals using an ensemble of MLPs combined by Adaboost","authors":"Tarun Jindal, U. Bhattacharya","doi":"10.1145/2505377.2505380","DOIUrl":"https://doi.org/10.1145/2505377.2505380","url":null,"abstract":"In this article, we present our recent study of offline recognition of handwritten numerals of three Indian scripts -- Devanagari, Bangla and Oriya. Here, we propose a novel approach to combination of multiple MLP classifiers with varying number of hidden nodes based on Adaboost technique. In this recognition study, we used Zernike moment features of different orders. We obtained classification results corresponding to a number of orders of this moment function and the best classification result for each script was obtained when the feature vector consists of moment values up to the order 8. It is well-known that the classification performance of an MLP largely depends on the choice of the number of hidden nodes. In the present work, we studied use of boosting as a solution to this problem of using MLP as a classifier in real-life applications. Here, we use an ensemble of MLP classifiers having different hidden layer sizes and results of their classification are combined based on Adaboost technique. Classification results have been provided using publicly available databases [1] of offline handwritten numeral images of three Indian scripts.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114950027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505394
A. Ul-Hasan, T. Breuel
{"title":"Can we build language-independent OCR using LSTM networks?","authors":"A. Ul-Hasan, T. Breuel","doi":"10.1145/2505377.2505394","DOIUrl":"https://doi.org/10.1145/2505377.2505394","url":null,"abstract":"Language models or recognition dictionaries are usually considered an essential step in OCR. However, using a language model complicates training of OCR systems, and it also narrows the range of texts that an OCR system can be used with. Recent results have shown that Long Short-Term Memory (LSTM) based OCR yields low error rates even without language modeling. In this paper, we explore the question to what extent LSTM models can be used for multilingual OCR without the use of language models. To do this, we measure cross-language performance of LSTM models trained on different languages. LSTM models show good promise to be used for language-independent OCR. The recognition errors are very low (around 1%) without using any language model or dictionary correction.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114810342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505379
Gurpreet Singh Lehal, Ankur Rana
{"title":"Recognition of Nastalique Urdu ligatures","authors":"Gurpreet Singh Lehal, Ankur Rana","doi":"10.1145/2505377.2505379","DOIUrl":"https://doi.org/10.1145/2505377.2505379","url":null,"abstract":"There has been considerable work on Arabic OCR. However, all that work is based on Naskh style. Urdu script is based on Arabic alphabet, but uses Nastalique style. The Nastalique style makes OCR in general and character segmentation in particular, a highly challenging task, so most of the researchers avoid the character segmentation phase and go in for higher unit of recognition. For Urdu, the next higher recognition unit considered by researchers is ligature, which lies between character and word. A ligature is a connected component of one or more characters and usually an Urdu word is composed of 1 to 8 ligatures. There are more than 25,000 Urdu ligatures, out of which top 4567 ligatures account for 99% of coverage. From OCR point of view, a ligature can further be segmented into one primary connected component and zero or more secondary connected components. The primary component represents the basic shape of the ligature, while the secondary connected component corresponds to the dots and diacritics marks and special symbols associated with the ligature. To reduce the class count, the ligatures with similar primary components are clubbed together. In this paper, we have presented a system to recognize 9262 ligatures formed from 2190 primary and 17 secondary components. Various combinations of DCT, Gabor filters and zoning based features along with kNN, HMM and SVM classifiers have been tried and a recognition accuracy of 98% has been reported on pre-segmented ligatures.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132149024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505385
Sheikh Faisal Rashid, M. Schambach, J. Rottland, Stephan von der Nüll
{"title":"Low resolution Arabic recognition with multidimensional recurrent neural networks","authors":"Sheikh Faisal Rashid, M. Schambach, J. Rottland, Stephan von der Nüll","doi":"10.1145/2505377.2505385","DOIUrl":"https://doi.org/10.1145/2505377.2505385","url":null,"abstract":"OCR of multi-font Arabic text is difficult due to large variations in character shapes from one font to another. It becomes even more challenging if the text is rendered at very low resolution. This paper describes a multi-font, low resolution, and open vocabulary OCR system based on a multidimensional recurrent neural network architecture. For this work, we have developed various systems, trained for single-font/single-size, single-font/multi-size, and multi-font/multi-size data of the well known Arabic printed text image database (APTI). The evaluation tasks from the second Arabic text recognition competition, organized in conjunction with ICDAR 2013, have been adopted. Ten Arabic fonts in six font size categories are used for evaluation. Results show that the proposed method performs very well on the task of printed Arabic text recognition even for very low resolution and small font size images. Overall, the system yields above 99% recognition accuracy at character and word level for most of the printed Arabic fonts.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127987758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505383
Jinyu Zuo, Esin Darici
{"title":"A robust table registration method for batch table OCR processing","authors":"Jinyu Zuo, Esin Darici","doi":"10.1145/2505377.2505383","DOIUrl":"https://doi.org/10.1145/2505377.2505383","url":null,"abstract":"A robust table registration method is proposed in this paper for a better understanding on structured information from scanned table images. Scanned images can be heavily degraded because of scanning effects, binarization or purely document itself. For batch processing images with the same table structure, normally the table model is provided and can be used to overcome most challenging quality factors. The given table model is used as the ground truth in this paper. However, only rough precision is needed on table cell dimensions and this makes providing the table model an easier task. The method was tested on Multilingual Automatic Document Classification Analysis and Translation (MADCAT) images and a promising performance is achieved.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505387
H. Ye, Liangrui Peng
{"title":"Word level script recognition for Uighur document mixed with English script","authors":"H. Ye, Liangrui Peng","doi":"10.1145/2505377.2505387","DOIUrl":"https://doi.org/10.1145/2505377.2505387","url":null,"abstract":"Script recognition is one of the key technologies in Uighur OCR research, as it is common to find English words or sentences in Uighur documents, especially in scientific documents. A word level based script recognition is presented in this paper. The original Uighur text images are segmented into text lines. The text line images are then segmented into word level images. Features are extracted in sub-blocks of the word level images. Two features, edge hinge feature and Gabor feature, are introduced and compared. SVM is adopted as classifier and trained by the labeled segmented word images. The final script recognition results are given by fusing the results of sub-blocks of segmented word images. Experimental results are made on segmented word images and text line images, which prove the effectiveness of the proposed method.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121245690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505378
S. D. Chowdhury, U. Bhattacharya, S. K. Parui
{"title":"Levenshtein distance metric based holistic handwritten word recognition","authors":"S. D. Chowdhury, U. Bhattacharya, S. K. Parui","doi":"10.1145/2505377.2505378","DOIUrl":"https://doi.org/10.1145/2505377.2505378","url":null,"abstract":"The rapid spread of pen-based digital devices and touch screen devices coupled with their affordability, and capability to take technology and digitization of data to the grassroots, has made online handwriting recognition an active field of research. The relevance of research on on-line handwriting recognition for Indian scripts is particularly high because the challenges posed by Indian scripts are different from other scripts. This is not only because of their extremely large alphabet size but also because the inter class variability among several classes is very small. In this article, we introduce a limited vocabulary online unconstrained handwritten Bangla (a major Indian script) word recognizer based on a novel word level feature representation. Here, we consider three different features extracted from a word sample and three event strings are generated corresponding to these three features. A distance function is formulated which uses the Levenshtein distance metric to compute the distance between two triplets of event strings representing two word samples. The nearest neighbour scheme is used to classify the input sample. We have simulated the proposed approach on vocabularies of varying sizes and the recognition performances are encouraging.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127663514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2509977
Xujun Peng, Huaigu Cao, S. Setlur, V. Govindaraju, P. Natarajan
{"title":"Multilingual OCR research and applications: an overview","authors":"Xujun Peng, Huaigu Cao, S. Setlur, V. Govindaraju, P. Natarajan","doi":"10.1145/2505377.2509977","DOIUrl":"https://doi.org/10.1145/2505377.2509977","url":null,"abstract":"This paper offers an overview of the current approaches to research in the field of off-line multilingual OCR. Typically, off-line OCR systems are designed for a particular script or language. However, the ideal approach to multilingual OCR would likely be to develop a system that can, with the use of language-specific training data, be re-targeted to process different languages with minimal modifications. This is still an open area of research with plenty of challenges. This is particularly true for multilingual handwriting recognition due to the added complexity of variations in writing styles even within the same scripts. Challenges for multilingual OCR in preprocessing, feature extraction, script identification and recognition modeling and a brief survey of research in these areas are presented in the paper. Ideas for future research in multilingual OCR are outlined.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132352255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505393
Ritu Garg, Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy
{"title":"Text graphic separation in Indian newspapers","authors":"Ritu Garg, Anukriti Bansal, S. Chaudhury, Sumantra Dutta Roy","doi":"10.1145/2505377.2505393","DOIUrl":"https://doi.org/10.1145/2505377.2505393","url":null,"abstract":"Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132636597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505381
Gurpreet Singh Lehal
{"title":"A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models","authors":"Gurpreet Singh Lehal","doi":"10.1145/2505377.2505381","DOIUrl":"https://doi.org/10.1145/2505377.2505381","url":null,"abstract":"English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual capability reduces the recognition accuracy for monolingual texts due to errors in script identification. Even a system with 99% script identification accuracy results in reduction of 1% recognition accuracy on monolingual text. In this paper, we present a bilingual OCR, which recognizes both English and Gurmukhi scripts without any significant reduction in recognition accuracy as compared to the monolingual Gurmukhi OCR when recognizing monolingual Gurmukhi text. This is achieved by using multiple script identification engines and language models for both English and Gurmukhi scripts. For the first time, such a system has been developed, which recognizes with high accuracy document images containing mixed Gurmukhi and English text or only Gurmukhi/English text.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}