{"title":"Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment","authors":"Ritu Garg, S. Chaudhury","doi":"10.1109/DAS.2016.53","DOIUrl":"https://doi.org/10.1109/DAS.2016.53","url":null,"abstract":"Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the pre-processing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Garz, Mathias Seuret, Fotini Simistira, Andreas Fischer, R. Ingold
{"title":"Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction","authors":"A. Garz, Mathias Seuret, Fotini Simistira, Andreas Fischer, R. Ingold","doi":"10.1109/DAS.2016.29","DOIUrl":"https://doi.org/10.1109/DAS.2016.29","url":null,"abstract":"Ground truth is both - indispensable for training and evaluating document analysis methods, and yet very tedious to create manually. This especially holds true for complex historical manuscripts that exhibit challenging layouts with interfering and overlapping handwriting. In this paper, we propose a novel semi-automatic system to support layout annotations in such a scenario based on document graphs and a pen-based scribbling interaction. On the one hand, document graphs provide a sparse page representation that is already close to the desired ground truth and on the other hand, scribbling facilitates an efficient and convenient pen-based interaction with the graph. The performance of the system is demonstrated in the context of a newly introduced database of historical manuscripts with complex layouts.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128536429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks","authors":"Fallak Asad, A. Ul-Hasan, F. Shafait, A. Dengel","doi":"10.1109/DAS.2016.69","DOIUrl":"https://doi.org/10.1109/DAS.2016.69","url":null,"abstract":"Documents are routinely captured by digital cameras in today's age owing to the availability of high quality cameras in smart phones. However, recognition of camera-captured documents is substantially more challenging as compared to traditional flat bed scanned documents due to the distortions introduced by the cameras. One of the major performancelimiting artifacts is the motion and out-of-focus blur that is often induced in the document during the capturing process. Existing approaches try to detect presence of blur in the document to inform the user for re-capturing the image. This paper reports, for the first time, an Optical Character Recognition (OCR) system that can directly recognize blurred documents on which the stateof-the-art OCR systems are unable to provide usable results. Our presented system is based on the Long Short-Term Memory (LSTM) networks and has shown promising character recognition results on both the motion-blurred and out-of-focus blurred images. One important feature of this work is that the LSTM networks have been applied directly to the gray-scale document images to avoid error-prone binarization of blurred documents. Experiments are conducted on publicly available SmartDoc-QA dataset that contains a wide variety of image blur degradations. Our presented system achieves 12.3% character error rate on the test documents, which is an over three-fold reduction in the error rate (38.9%) of the best-performing contemporary OCR system (ABBYY Fine Reader) on the same data.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125716320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alireza Alaei, Donatello Conte, M. Blumenstein, R. Raveaux
{"title":"Document Image Quality Assessment Based on Texture Similarity Index","authors":"Alireza Alaei, Donatello Conte, M. Blumenstein, R. Raveaux","doi":"10.1109/DAS.2016.33","DOIUrl":"https://doi.org/10.1109/DAS.2016.33","url":null,"abstract":"In this paper, a full reference document image quality assessment (FR DIQA) method using texture features is proposed. Local binary patterns (LBP) as texture features are extracted at the local and global levels for each image. For each extracted LBP feature set, a similarity measure called the LBP similarity index (LBPSI) is computed. A weighting strategy is further proposed to improve the LBPSI obtained based on local LBP features. The LBPSIs computed for both local and global features are then combined to get the final LBPSI, which also provides the best performance for DIQA. To evaluate the proposed method, two different datasets were used. The first dataset is composed of document images, whereas the second one includes natural scene images. The mean human opinion scores (MHOS) were considered as ground truth for performance evaluation. The results obtained from the proposed LBPSI method indicate a significant improvement in automatically/accurately predicting image quality, especially on the document image-based dataset.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128971805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Sidère, Jean-Yves Ramel, Sabine Barrat, V. P. d'Andecy, S. Kebairi
{"title":"A Compliant Document Image Classification System Based on One-Class Classifier","authors":"Nicolas Sidère, Jean-Yves Ramel, Sabine Barrat, V. P. d'Andecy, S. Kebairi","doi":"10.1109/DAS.2016.55","DOIUrl":"https://doi.org/10.1109/DAS.2016.55","url":null,"abstract":"Document image classification in a professional context requires to respect some constraints such as dealing with a large variability of documents and/or number of classes. Whereas most methods deal with all classes at the same time, we answer this problem by presenting a new compliant system based on the specialization of the features and the parametrization of the classifier separately, class per class. We first compute a generalized vector of features based on global image characterization and structural primitives. Then, for each class, the feature vector is specialized by ranking the features according a stability score. Finally, a one-class K-nn classifier is trained using these specific features. Conducted experiments reveal good classification rates, proving the ability of our system to deal with a large range of documents classes.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Detection in Indic OCRs","authors":"V. Vinitha, C. V. Jawahar","doi":"10.1109/DAS.2016.31","DOIUrl":"https://doi.org/10.1109/DAS.2016.31","url":null,"abstract":"A good post processing module is an indispensable part of an OCR pipeline. In this paper, we propose a novel method for error detection in Indian language OCR output. Our solution uses a recurrent neural network (RNN) for classification of a word as an error or not. We propose a generic error detection method and demonstrate its effectiveness on four popular Indian languages. We divide the words into their constituent aksharas and use their bigram and trigram level information to build a feature representation. In order to train the classifier on incorrect words, we use the mis-recognized words in the output of the OCR. In addition to RNN, we also explore the effectiveness of a generative model such as GMM for our task and demonstrate an improved performance by combining both the approaches. We tested our method on four popular Indian languages and report an average error detection performance above 80%.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Interactive Transcription System of Census Records Using Word-Spotting Based Information Transfer","authors":"J. M. Romeu, A. Fornés, J. Lladós","doi":"10.1109/DAS.2016.47","DOIUrl":"https://doi.org/10.1109/DAS.2016.47","url":null,"abstract":"This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129831871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Candidate Component Extraction for Text Localization in Born-Digital Images by Combining Text Contours and Stroke Interior Regions","authors":"Kai Chen, Fei Yin, Cheng-Lin Liu","doi":"10.1109/DAS.2016.30","DOIUrl":"https://doi.org/10.1109/DAS.2016.30","url":null,"abstract":"Extracting candidate text connected components (CCs) is critical for CC-based text localization. Based on the observation that text strokes in born-digital images mostly have complete contours and the text pixels have high contrast with the adjacent non-text pixels, we propose a method to extract candidate text CCs by combining text contours and stroke interior regions. After segmenting the image into non-smooth and smooth regions based on local contrast, text contour pixels in non-smooth regions are detached from adjacent non-text pixels by local binarization. Then, obvious non-text contours can be removed according to the spatial relationship of text and non-text contours. While smooth regions include stroke interior regions and non-text smooth regions, some non-text smooth regions can be easily removed because they are not surrounded by candidate text contours. At last, candidate text contours and stroke interior regions are combined to generate candidate text CCs. The CCs undergo CC filtering, text line grouping and line classification to give the text localization result. Experimental results on the born-digital dataset of ICDAR2013 robust reading competition demonstrate the efficiency and superiority of the proposed method.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127488207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality Prediction System for Large-Scale Digitisation Workflows","authors":"C. Clausner, S. Pletschacher, A. Antonacopoulos","doi":"10.1109/DAS.2016.82","DOIUrl":"https://doi.org/10.1109/DAS.2016.82","url":null,"abstract":"The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christophe Rigaud, Thanh Nam Le, J. Burie, J. Ogier, Shoya Ishimaru, M. Iwata, K. Kise
{"title":"Semi-automatic Text and Graphics Extraction of Manga Using Eye Tracking Information","authors":"Christophe Rigaud, Thanh Nam Le, J. Burie, J. Ogier, Shoya Ishimaru, M. Iwata, K. Kise","doi":"10.1109/DAS.2016.72","DOIUrl":"https://doi.org/10.1109/DAS.2016.72","url":null,"abstract":"The popularity of storing, distributing and reading comic books electronically has made the task of comics analysis an interesting research problem. Different work have been carried out aiming at understanding their layout structure and the graphic content. However the results are still far from universally applicable, largely due to the huge variety in expression styles and page arrangement, especially in manga (Japanese comics). In this paper, we propose a comic image analysis approach using eye-tracking data recorded during manga reading sessions. As humans are extremely capable of interpreting the structured drawing content, and show different reading behaviors based on the nature of the content, their eye movements follow distinguishable patterns over text or graphic regions. Therefore, eye gaze data can add rich information to the understanding of the manga content. Experimental results show that the fixations and saccades indeed form consistent patterns among readers, and can be used for manga textual and graphical analysis.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130239364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}