{"title":"Scene Text Extraction Using Focus of Mobile Camera","authors":"Egyul Kim, Seonghun Lee, J. H. Kim","doi":"10.1109/ICDAR.2009.21","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.21","url":null,"abstract":"Robust extraction of text from scene images is essential for successful scene text recognition. Scene images usually have non-uniform illumination, complex background, and existence of text-like objects. The common assumption of a homogeneous text region on a nearly uniform background cannot be maintained in real applications. We proposed a text extraction method that utilizes user's hint on the location of the text within the image. A resizable square rim in the viewfinder of the mobile camera, referred to here as a 'focus', is the interface used to help the user indicate the target text. With the hint from the focus, the color of the target text is easily estimated by clustering colors only within the focused section. Image binarization with the estimated color is performed to extract connected components. After obtaining the text region within the focused section, the text region is expanded iteratively by searching neighboring regions with the updated text color. Such an iterative method would prevent the problem of one text region being separated into more than one component due to non-uniform illumination and reflection. A text verification process is conducted on the extracted components to determine the true text region. It is demonstrated that the proposed method achieved high accuracy of text extraction for moderately difficult examples from the ICDAR 2003 database.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Pixel-level Statistical Structural Descriptor for Shape Measure and Recognition","authors":"Jing Zhang, Wenyin Liu","doi":"10.1109/ICDAR.2009.175","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.175","url":null,"abstract":"A novel shape descriptor based on the histogram matrix of pixel-level structural features is presented. First, length ratios and angles between the centroid and contour points of a shape are calculated as two structural attributes. Then, the attributes are combined to construct a new histogram matrix in the feature space statistically. The proposed shape descriptor can measure circularity, smoothness, and symmetry of shapes, and be used to recognize shapes. Experimental results demonstrate the effectiveness of our method.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116590668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Arabic and Latin Script Identification in Printed and Handwritten Types Based on Steerable Pyramid Features","authors":"Mohamed Benjelil, S. Kanoun, R. Mullot, A. Alimi","doi":"10.1109/ICDAR.2009.287","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.287","url":null,"abstract":"Arabic and Latin script identification in printed and handwritten nature present several difficulties because the Arabic (printed or handwritten) and the handwritten Latin scripts are cursive scripts of nature. To avoid all possible confusions which can be generated, we propose in this paper an accurate and suitable designed system for script identification at word level which is based on steerable pyramid transform. The features extracted from pyramid sub bands serve to classify the scripts on only one script among the scripts to identify. The encouraging and promising results obtained are presented in this research paper.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116781306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Authentication of Security Documents","authors":"Utpal Garain, Biswajit Halder","doi":"10.1109/ICDAR.2009.234","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.234","url":null,"abstract":"This paper presents a pioneering effort towards machine authentication of security documents like bank cheques, legal deeds, certificates, etc. that fall under the same class as far as security is concerned. The proposed method first computationally extracts the security features from the document images and then the notion of ‘genuine’ vs. ‘duplicate’ is defined in the feature space. Bank cheques are taken as a reference for conducting the present experiment. Support Vector Machines (SVMs) and Neural Networks (NN) are involved to verify authenticity of these cheques. Results on a test dataset of 200 samples show that the proposed approach achieves about 98% accuracy for discriminating duplicate cheques from genuine ones. This strongly attests the viability of involving machine in authenticating security documents.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125122815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Similarity Model Estimation for Approximate Recognized Text Search","authors":"A. Takasu","doi":"10.1109/ICDAR.2009.193","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.193","url":null,"abstract":"Approximate text search is a basic technique to handle recognized text that contains recognition errors.This paper proposes an approximate string search for recognized texturing a statistical similarity model focusing on parameter estimation.The main contribution of this paper is to propose a parameter estimation algorith using variational Bayesian expectation maximization technique. We applied the obtained model to approximate substring detection problem and experimentally showed that the Bayesian estimation is effective.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114281234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Iwata, K. Kise, T. Nakai, M. Iwamura, S. Uchida, S. Omachi
{"title":"Capturing Digital Ink as Retrieving Fragments of Document Images","authors":"K. Iwata, K. Kise, T. Nakai, M. Iwamura, S. Uchida, S. Omachi","doi":"10.1109/ICDAR.2009.192","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.192","url":null,"abstract":"This paper presents a new method of capturing digital ink for pen-based computing. Current technologies such as tablets, ultrasonic and the Anoto pens rely on special mechanisms for locating the pen tip,which result in limiting the applicability.Our proposal is to ease this problem --- a camera pen that allows us to write on ordinary paper for capturing digital ink. A document image retrieval method called LLAH is tuned to locate the pen tip efficiently and accurately on the coordinates of a document only by capturing its tiny fragment.In this paper, we report some results on captured digital ink as well as to evaluate their quality.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131265521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Confidence-Based Discriminative Training for Model Adaptation in Offline Arabic Handwriting Recognition","authors":"P. Dreuw, G. Heigold, H. Ney","doi":"10.1109/ICDAR.2009.116","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.116","url":null,"abstract":"We present a novel confidence-based discriminative training for model adaptation approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations.Most current approaches are maximum-likelihood trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer specific data.Discriminative training based on the Maximum Mutual Information criterion is used to train writer independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. Additionally, the training criterion is extended to incorporate a margin term.The proposed methods are evaluated on the IFN/ENIT Arabic handwriting database, where the proposed novel adaptation approach can decrease the word-error-rate by 33% relative.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126983741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pre-Processing of Degraded Printed Documents by Non-local Means and Total Variation","authors":"Laurence Likforman-Sulem, J. Darbon, E. B. Smith","doi":"10.1109/ICDAR.2009.210","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.210","url":null,"abstract":"We compare in this study two image restoration approaches for the pre-processing of printed documents:namely the Non-local Means filter and a total variation minimization approach. We apply these two approaches to printed document sets from various periods,and we evaluate their effectiveness through character recognition performance using an open source OCR. Our results show that for each document set, one or both pre-processing methods improve character recog-nition accuracy over recognition without preprocessing. Higher accuracies are obtained with Non-local Means when characters have a low level of degradation since they can be restored by similar neighboring parts of non-degraded characters. The Total Variation approach is more effective when characters are highly degraded and can only be restored through modeling instead of using neighboring data.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127698767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Approach for Rotation Free Online Handwritten Chinese Character Recognition","authors":"Shengming Huang, Lianwen Jin, Jin Lv","doi":"10.1109/ICDAR.2009.114","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.114","url":null,"abstract":"This paper presents a method for rotation free online handwritten Chinese character recognition (RFOHCCR). Given a skew online handwritten character sample, two orientation correction steps, including angle rectification according to the starting point, angle readjustment based on principal direction axes, are first performed to rectify the skew angle of the sample. Then 8-directional feature is extracted and the character is classified using the classifier trained by artificially rotated samples. Experiments on 863 online Chinese character dataset and SCUT-COUCH dataset show the effectiveness of the proposed approach.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132725588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keyword Spotting in Document Images through Word Shape Coding","authors":"Shuyong Bai, Linlin Li, C. Tan","doi":"10.1109/ICDAR.2009.54","DOIUrl":"https://doi.org/10.1109/ICDAR.2009.54","url":null,"abstract":"With large databases of document images available,a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) on each document followed by indexing of the resulting text. However, if the quality of the document is poor or time is critical,complete OCR of all images is infeasible. This paper build upon previous works on Word Shape Coding to propose an alternative technique and combination of feature descriptors for keyword spotting without the use of OCR. Different sequence alignment similarity measures can be used for partial or whole word matching. The proposed technique is tolerant to serifs,font styles and certain degrees of touching, broken or overlapping characters. It improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching. Experiment results show that it is about 15 times faster than OCR. It is a promising technique to boost better document image retrieval.","PeriodicalId":433762,"journal":{"name":"2009 10th International Conference on Document Analysis and Recognition","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132840768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}