{"title":"A Table Detection Method for PDF Documents Based on Convolutional Neural Networks","authors":"Leipeng Hao, Liangcai Gao, Xiaohan Yi, Zhi Tang","doi":"10.1109/DAS.2016.23","DOIUrl":"https://doi.org/10.1109/DAS.2016.23","url":null,"abstract":"Because of the better performance of deep learning on many computer vision tasks, researchers in the area of document analysis and recognition begin to adopt this technique into their work. In this paper, we propose a novel method for table detection in PDF documents based on convolutional neutral networks, one of the most popular deep learning models. In the proposed method, some table-like areas are selected first by some loose rules, and then the convolutional networks are built and refined to determine whether the selected areas are tables or not. Besides, the visual features of table areas are directly extracted and utilized through the convolutional networks, while the non-visual information (e.g. characters, rendering instructions) contained in original PDF documents is also taken into consideration to help achieve better recognition results. The primary experimental results show that the approach is effective in table detection.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129213323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of the Stability of Four Document Segmentation Algorithms","authors":"Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier","doi":"10.1109/DAS.2016.25","DOIUrl":"https://doi.org/10.1109/DAS.2016.25","url":null,"abstract":"The importance of having stable information extraction algorithms for security related applications and more generally for industrial use cases has been recently highlighted. Stability is what makes an algorithm reliable as it gives a guarantee that the results will be reproducible on similar data. Without it, security criteria such as the probability of false positives cannot be quantified. As a consequence, no security application can be built from an unstable algorithm. In a document verification framework, the probability of false positives indicates the probability that two different results are given for two copies of the same document. This paper builds on our previous work about a stable layout descriptor to study the stability of four segmentation algorithms. We consider that a segmentation algorithm is stable if it produces the same layout for all copies of the same document. The algorithms studied are two versions of PAL, Voronoi, and JSEG. We compare the stability of the different algorithms and study the factors influencing their stability.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132906995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. K. Srinivas, P. Shivakumara, G. Kumar, U. Pal, Tong Lu
{"title":"New Sharpness Features for Image Type Classification Based on Textual Information","authors":"R. K. Srinivas, P. Shivakumara, G. Kumar, U. Pal, Tong Lu","doi":"10.1109/DAS.2016.18","DOIUrl":"https://doi.org/10.1109/DAS.2016.18","url":null,"abstract":"Achieving good recognition results from a single method for text lines in video/natural scene images captured by high resolution cameras or low resolution mobile cameras, and images in web pages, is often hard. In this paper, we propose new sharpness based features of textual portion of each input text line image using HSI color space for the classification of an input image into one of the four classes (video, scene, mobile or born digital). This helps in choosing an appropriate method based on the class type of the input text for its improved recognition rate. For a given input text line image, the proposed method obtains H, S and I images. Then Canny edge images are obtained for H, S and I spaces, which results in text candidates. We perform sliding window operation over the text candidate image of each text line of each color space to estimate new sharpness by calculating stroke width and gradient information. The sharpness values of the text lines of the three color spaces are then fed to k-means clustering with maximum, minimum and average guesses, which results in three respective clusters. The mean of each cluster for respective color spaces outputs a feature vector having nine feature values for image classification with the help of an SVM classifier. Experimental results on standard datasets, namely, ICDAR 2013, ICDAR 2015 video, ICDAR 2015 natural scene data, ICDAR 2013 born digital data and the images captured by a mobile camera (our own data) show that the proposed classification method helps in improving recognition results.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130362760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Retsinas, G. Louloudis, N. Stamatopoulos, B. Gatos
{"title":"Efficient Document Image Segmentation Representation by Approximating Minimum-Link Polygons","authors":"George Retsinas, G. Louloudis, N. Stamatopoulos, B. Gatos","doi":"10.1109/DAS.2016.59","DOIUrl":"https://doi.org/10.1109/DAS.2016.59","url":null,"abstract":"The result of a document image segmentation task, e.g. text line or word segmentation, is usually a labeled image with each label corresponding to a different segmented region. For many applications, the segmented regions need to be stored and represented in an efficient way, using simple geometric shapes. A challenging task is to restrict all pixels corresponding to a specific label inside a polygon with a minimum number of vertices. Such a polygon promotes the description simplicity and the storage efficiency, while providing a much more user-friendly representation that can be edited easily. The proposed method is a cost-effective approximation of the minimum-edges polygon problem, computing a contour enclosing only pixels of a certain label and using a greedy algorithm in order to reduce the contour into a minimum-link polygon that retains the separability property between the labeled set of pixels.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Named Entity Recognition from Unstructured Handwritten Document Images","authors":"Chandranath Adak, B. Chaudhuri, M. Blumenstein","doi":"10.1109/DAS.2016.15","DOIUrl":"https://doi.org/10.1109/DAS.2016.15","url":null,"abstract":"Named entity recognition is an important topic in the field of natural language processing, whereas in document image processing, such recognition is quite challenging without employing any linguistic knowledge. In this paper we propose an approach to detect named entities (NEs) directly from offline handwritten unstructured document images without explicit character/word recognition, and with very little aid from natural language and script rules. At the preprocessing stage, the document image is binarized, and then the text is segmented into words. The slant/skew/baseline corrections of the words are also performed. After preprocessing, the words are sent for NE recognition. We analyze the structural and positional characteristics of NEs and extract some relevant features from the word image. Then the BLSTM neural network is used for NE recognition. Our system also contains a post-processing stage to reduce the true NE rejection rate. The proposed approach produces encouraging results on both historical and modern document images, including those from an Australian archive, which are reported here for the very first time.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127487005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Word Clustering Using Deep Features","authors":"Mandar Kulkarni, S. Karande, S. Lodha","doi":"10.1109/DAS.2016.14","DOIUrl":"https://doi.org/10.1109/DAS.2016.14","url":null,"abstract":"Digitization is crucial especially in the Indian context. OCR engines fail on Indian scripts mainly because character segmentation is non-trivial. Even word based recognition approaches suffer from the issues such as time degradations, word segmentation errors, font style/size variations. In this paper, we propose a deep learning architecture based approach for unsupervised word clustering. An edge responsive untrained Convolutional Neural Network (CNN) is used as a feature extractor. Graph connected component analysis is applied on the similarity graph computed from the word features. Our approach inherently detects similar shape patterns at word level and hence, it is language agnostic. We validated our approach against multiple state of art word matching techniques. Experimental results show that our approach significantly outperforms all of them on variety of data sets. In addition, the approach is observed to be robust to word segmentation errors, font style/size variations.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131152228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Banerjee, Sumit Choudhary, Supriya Das, Himadri Majumdar, Rahul Roy, B. Chaudhuri
{"title":"Automatic Hyperlinking of Engineering Drawing Documents","authors":"P. Banerjee, Sumit Choudhary, Supriya Das, Himadri Majumdar, Rahul Roy, B. Chaudhuri","doi":"10.1109/DAS.2016.76","DOIUrl":"https://doi.org/10.1109/DAS.2016.76","url":null,"abstract":"In construction or manufacturing industry, engineering drawings are used as blueprint or plan documents to facilitate the construction or manufacturing process. A fairly large construction project involves very large number of these documents, divided into different sub-sections. An engineer or architect often needs to refer different documents while preparing a new one or marking some irregularity in some document. Therefore they need to navigate through different files. It becomes an extremely difficult and time consuming task to move from one file to another in an interactive way. This paper describes an automated technique to access information from the existing drawing documents and create hyperlinks in order to enable the engineers to quickly navigate between files. The overall accuracy of our system for a class of documents is a decent 94.46%.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123987606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Katsouros, V. Papavassiliou, Fotini Simistira, B. Gatos
{"title":"Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs","authors":"V. Katsouros, V. Papavassiliou, Fotini Simistira, B. Gatos","doi":"10.1109/DAS.2016.60","DOIUrl":"https://doi.org/10.1109/DAS.2016.60","url":null,"abstract":"Optical Character Recognition (OCR) of ancient Greek polytonic scripts is a challenging task due to the large number of character classes, resulting from variations of diacritical marks on the vowel letters. Classical OCR systems require a character segmentation phase, which in the case of Greek polytonic scripts is the main source of errors that finally affects the overall OCR performance. This paper suggests a character segmentation free HMM-based recognition system and compares its performance with other commercial, open source, and state-of-the art OCR systems. The evaluation has been carried out on a challenging novel dataset of Greek polytonic degraded texts and has shown that HMM-based OCR yields character and word level error rates of 8.61% and 25.30% respectively, which outperforms most of the available OCR systems and it is comparable with the performance of the state-of-the-art system based on LSTM Networks proposed recently.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124567296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Scale Continuous Dating of Medieval Scribes Using a Combined Image and Language Model","authors":"Fredrik Wahlberg, Lasse Mårtensson, Anders Brun","doi":"10.1109/DAS.2016.71","DOIUrl":"https://doi.org/10.1109/DAS.2016.71","url":null,"abstract":"Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection \"Svenskt Diplomatariums huvudkartotek\" (SDHK), including more than 5300 transcribed charters from the period 1135 - 1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114440021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Zoning Technique for Word Spotting Using Dynamic Time Warping","authors":"A. Papandreou, B. Gatos, Konstantinos Zagoris","doi":"10.1109/DAS.2016.79","DOIUrl":"https://doi.org/10.1109/DAS.2016.79","url":null,"abstract":"Zoning features have been proved one of the most efficient statistical features which provide high speed and low complexity word matching. They are calculated by the density of pixels or pattern characteristics in several zones that the pattern frame is divided. In this paper, an adaptive zoning technique for efficient word spotting is introduced. The main idea is that the zoning features are extracted after cutting the query word in vertical zones, according to its length and pixel distribution along the horizontal axis, and adjusting these boundaries optimally with the corresponding zones in the candidate match-word using Dynamic Time Warping (DTW). This adjustment is performed by coupling every zone of the query word to the corresponding zone of each candidate match-word with the use of the corresponding warping matrix. This process absorbs the ambiguities between the query and the candidate match words and due to this fact it can be applied to both machine-printed and handwritten document images. The proposed word spotting technique is tested using the pixel density as a characteristic feature in every zone and an improvement is recorded compared to other state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125436124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}