MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505384
Leonard Rothacker, G. Fink, P. Banerjee, U. Bhattacharya, B. Chaudhuri
{"title":"Bag-of-features HMMs for segmentation-free Bangla word spotting","authors":"Leonard Rothacker, G. Fink, P. Banerjee, U. Bhattacharya, B. Chaudhuri","doi":"10.1145/2505377.2505384","DOIUrl":"https://doi.org/10.1145/2505377.2505384","url":null,"abstract":"In this paper we present how Bag-of-Features Hidden Markov Models can be applied to printed Bangla word spotting. These statistical models allow for an easy adaption to different problem domains. This is possible due to the integration of automatically estimated visual appearance features and Hidden Markov Models for spatial sequential modeling. In our evaluation we are able to report high retrieval scores on a new printed Bangla dataset. Furthermore, we outperform state-of-the-art results on the well-known George Washington word spotting benchmark. Both results have been achieved using an almost identical parametric method configuration.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123652062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505382
Dmitriy Genzel, Ashok Popat, R. Teunen, Yasuhisa Fujii
{"title":"HMM-based script identification for OCR","authors":"Dmitriy Genzel, Ashok Popat, R. Teunen, Yasuhisa Fujii","doi":"10.1145/2505377.2505382","DOIUrl":"https://doi.org/10.1145/2505377.2505382","url":null,"abstract":"While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system used for OCR to the task of script/language ID, by replacing character labels with script class labels. We apply it in a multi-pass overall OCR process which achieves \"universal\" OCR over 54 tested languages in 18 distinct scripts, over a wide variety of typefaces in each. For comparison we also consider a brute-force approach, wherein a singe HMM-based OCR system is trained to recognize all considered scripts. Results are presented on a large and diverse evaluation set extracted from book images, both for script identification accuracy and for overall OCR accuracy. On this evaluation data, the script ID system provided a script ID error rate of 1.73% for 18 distinct scripts. The end-to-end OCR system with the script ID system achieved a character error rate of 4.05%, an increase of 0.77% over the case where the languages are known a priori.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121856405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505392
Jin Chen, D. Lopresti
{"title":"Ruling-based table analysis for noisy handwritten documents","authors":"Jin Chen, D. Lopresti","doi":"10.1145/2505377.2505392","DOIUrl":"https://doi.org/10.1145/2505377.2505392","url":null,"abstract":"Table analysis can be a valuable step in document image analysis. In the case of noisy handwritten documents, various artifacts complicate the task of locating tables on a page and segmenting them into cells. Our ruling-based approach first detects line segments to ensure high recall of table rulings, and then computes the intersections of horizontal and vertical rulings as key points. We then employ an optimization procedure to select the most probable subset of these key points which constitute the table structure. Finally, we decompose a table into a 2-D arrangement of cells using the key points. Experimental evaluation involving 61 handwritten pages from 17 table classes show a table cell precision of 89% and a recall of 88%.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"30 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129940211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505390
D. Kumar, M. Prasad, A. Ramakrishnan
{"title":"Multi-script robust reading competition in ICDAR 2013","authors":"D. Kumar, M. Prasad, A. Ramakrishnan","doi":"10.1145/2505377.2505390","DOIUrl":"https://doi.org/10.1145/2505377.2505390","url":null,"abstract":"A competition was organized by the authors to detect text from scene images. The motivation was to look for script-independent algorithms that detect the text and extract it from the scene images, which may be applied directly to an unknown script. The competition had four distinct tasks: (i) text localization and (ii) segmentation from scene images containing one or more of Kannada, Tamil, Hindi, Chinese and English words. (iii) English and (iv) Kannada word recognition task from scene word images. There were totally four submissions for the text localization and segmentation tasks. For the other two tasks, we have evaluated two algorithms, namely nonlinear enhancement and selection of plane and midline analysis and propagation of segmentation, already published by us. A complete picture on the position of an algorithm is discussed and suggestions are provided to improve the quality of the algorithms. Graphical depiction of f-score of individual images in the form of benchmark values is proposed to show the strength of an algorithm.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124593497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505389
P. Banerjee, B. Chaudhuri
{"title":"An approach for Bangla and Devanagari video text recognition","authors":"P. Banerjee, B. Chaudhuri","doi":"10.1145/2505377.2505389","DOIUrl":"https://doi.org/10.1145/2505377.2505389","url":null,"abstract":"Extraction and recognition of Bangla text from video frame images is challenging due to fonts type and style variation, complex color background, low-resolution, low contrast etc. In this paper, we propose an algorithm for extraction and recognition of Bangla and Devanagari text form video frames with complex background. Here, a two-step approach has been proposed. After text localization, the text line is segmented into words using information based on line contours. First order gradient values of the text blocks are used to find the word gap. Next, an Adaptive SIS binarization technique is applied on each word. Next this binarized text block is sent to a state of the art OCR for recognition.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121393703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505386
Kapil Mehrotra, Saumya Jetley, Akash Deshmukh, S. Belhe
{"title":"Unconstrained handwritten Devanagari character recognition using convolutional neural networks","authors":"Kapil Mehrotra, Saumya Jetley, Akash Deshmukh, S. Belhe","doi":"10.1145/2505377.2505386","DOIUrl":"https://doi.org/10.1145/2505377.2505386","url":null,"abstract":"In this paper, we introduce a novel offline strategy for recognition of online handwritten Devanagari characters entered in an unconstrained manner. Unlike the previous approaches based on standard classifiers - SVM, HMM, ANN and trained on statistical, structural or spectral features, our method, based on CNN, allows writers to enter characters in any number or order of strokes and is also robust to certain amount of overwriting. The CNN architecture supports an increased set of 42 Devanagari character classes. Experiments with 10 different configurations of CNN and for both Exponential Decay and Inverse Scale Annealing approaches to convergence, show highly promising results. In a further improvement, the final layer neuron outputs of top 3 configurations are averaged and used to make the classification decision, achieving an accuracy of 99.82% on the train data and 98.19% on the test data. This marks an improvement of 0.2% and 5.81%, for the train and test set respectively, over the existing state-of-the-art in unconstrained input. The data used for building the system is obtained from different parts of Devanagari writing states in India, in the form of isolated words. Character level data is extracted from the collected words using a hybrid approach and covers all possible variations owing to the different writing styles and varied parent word structures.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"423 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131778800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505391
A. Ramakrishnan, K. Urala
{"title":"Global and local features for recognition of online handwritten numerals and Tamil characters","authors":"A. Ramakrishnan, K. Urala","doi":"10.1145/2505377.2505391","DOIUrl":"https://doi.org/10.1145/2505377.2505391","url":null,"abstract":"Feature extraction is a key step in the recognition of online handwritten data and is well investigated in literature. In the case of Tamil online handwritten characters, global features such as those derived from discrete Fourier transform (DFT), discrete cosine transform (DCT), wavelet transform have been used to capture overall information about the data. On the hand, local features such as (x, y) coordinates, nth derivative, curvature and angular features have also been used. In this paper, we investigate the efficacy of using global features alone (DFT, DCT), local features alone (preprocessed (x, y) coordinates) and a combination of both global and local features. Our classifier, a support vector machine (SVM) with radial basis function (RBF) kernel, is trained and tested on the IWFHR 2006 Tamil handwritten character recognition competition dataset. We have obtained more than 95% accuracy on the test dataset which is greater than the best score reported in the literature. Further, we have used a combination of global and local features on a publicly available database of Indo-Arabic numerals and obtained an accuracy of more than 98%.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133461614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOCR '13Pub Date : 2013-08-24DOI: 10.1145/2505377.2505388
Soumyadeep Dey, J. Mukhopadhyay, S. Sural, Partha Bhowmick
{"title":"Re-targeting of multi-script document images for handheld devices","authors":"Soumyadeep Dey, J. Mukhopadhyay, S. Sural, Partha Bhowmick","doi":"10.1145/2505377.2505388","DOIUrl":"https://doi.org/10.1145/2505377.2505388","url":null,"abstract":"We propose here a technique for transforming the layout of a printed document image to a new user-conducive layout. Its objective is to effectuate better display in a low-resolution screen for providing comfort and convenience to a viewer while reading. The task of re-targeting starts with analyzing the document image in the spatial domain for identifying its paragraphs. Text lines, words, characters, and hyphenations are then recognized from each paragraph, and necessary word stitching is performed to reproduce the paragraph, as appropriate to the resolution of the display device. Test results and related subjective evaluation for different datasets, especially the pages scanned from some Bengali and English magazines, demonstrate the strength and effectiveness of the proposed technique.","PeriodicalId":288465,"journal":{"name":"MOCR '13","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}