2016 12th IAPR Workshop on Document Analysis Systems (DAS)最新文献_第2页

A Table Detection Method for PDF Documents Based on Convolutional Neural Networks 基于卷积神经网络的PDF文档表检测方法

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.23

Leipeng Hao, Liangcai Gao, Xiaohan Yi, Zhi Tang

引用次数: 99

Evaluation of the Stability of Four Document Segmentation Algorithms 四种文档分割算法的稳定性评价

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.25

Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier

引用次数: 3

New Sharpness Features for Image Type Classification Based on Textual Information 基于文本信息的图像类型分类的新清晰度特征

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.18

R. K. Srinivas, P. Shivakumara, G. Kumar, U. Pal, Tong Lu

{"title":"New Sharpness Features for Image Type Classification Based on Textual Information","authors":"R. K. Srinivas, P. Shivakumara, G. Kumar, U. Pal, Tong Lu","doi":"10.1109/DAS.2016.18","DOIUrl":"https://doi.org/10.1109/DAS.2016.18","url":null,"abstract":"Achieving good recognition results from a single method for text lines in video/natural scene images captured by high resolution cameras or low resolution mobile cameras, and images in web pages, is often hard. In this paper, we propose new sharpness based features of textual portion of each input text line image using HSI color space for the classification of an input image into one of the four classes (video, scene, mobile or born digital). This helps in choosing an appropriate method based on the class type of the input text for its improved recognition rate. For a given input text line image, the proposed method obtains H, S and I images. Then Canny edge images are obtained for H, S and I spaces, which results in text candidates. We perform sliding window operation over the text candidate image of each text line of each color space to estimate new sharpness by calculating stroke width and gradient information. The sharpness values of the text lines of the three color spaces are then fed to k-means clustering with maximum, minimum and average guesses, which results in three respective clusters. The mean of each cluster for respective color spaces outputs a feature vector having nine feature values for image classification with the help of an SVM classifier. Experimental results on standard datasets, namely, ICDAR 2013, ICDAR 2015 video, ICDAR 2015 natural scene data, ICDAR 2013 born digital data and the images captured by a mobile camera (our own data) show that the proposed classification method helps in improving recognition results.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130362760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Efficient Document Image Segmentation Representation by Approximating Minimum-Link Polygons 基于最小链接多边形的高效文档图像分割表示

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.59

George Retsinas, G. Louloudis, N. Stamatopoulos, B. Gatos

引用次数: 1

Named Entity Recognition from Unstructured Handwritten Document Images 非结构化手写文档图像的命名实体识别

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.15

Chandranath Adak, B. Chaudhuri, M. Blumenstein

引用次数: 18

Unsupervised Word Clustering Using Deep Features 使用深度特征的无监督词聚类

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.14

Mandar Kulkarni, S. Karande, S. Lodha

引用次数: 4

Automatic Hyperlinking of Engineering Drawing Documents 工程图纸文件的自动超链接

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.76

P. Banerjee, Sumit Choudhary, Supriya Das, Himadri Majumdar, Rahul Roy, B. Chaudhuri

引用次数: 10

Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs 用hmm识别历史退化文本中的希腊多音

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.60

V. Katsouros, V. Papavassiliou, Fotini Simistira, B. Gatos

引用次数: 8

Large Scale Continuous Dating of Medieval Scribes Using a Combined Image and Language Model 使用图像和语言组合模型的中世纪抄写员的大规模连续年代测定

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.71

Fredrik Wahlberg, Lasse Mårtensson, Anders Brun

{"title":"Large Scale Continuous Dating of Medieval Scribes Using a Combined Image and Language Model","authors":"Fredrik Wahlberg, Lasse Mårtensson, Anders Brun","doi":"10.1109/DAS.2016.71","DOIUrl":"https://doi.org/10.1109/DAS.2016.71","url":null,"abstract":"Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection \"Svenskt Diplomatariums huvudkartotek\" (SDHK), including more than 5300 transcribed charters from the period 1135 - 1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114440021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

An Adaptive Zoning Technique for Word Spotting Using Dynamic Time Warping 一种基于动态时间扭曲的自适应分区词识别技术

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.79

A. Papandreou, B. Gatos, Konstantinos Zagoris

{"title":"An Adaptive Zoning Technique for Word Spotting Using Dynamic Time Warping","authors":"A. Papandreou, B. Gatos, Konstantinos Zagoris","doi":"10.1109/DAS.2016.79","DOIUrl":"https://doi.org/10.1109/DAS.2016.79","url":null,"abstract":"Zoning features have been proved one of the most efficient statistical features which provide high speed and low complexity word matching. They are calculated by the density of pixels or pattern characteristics in several zones that the pattern frame is divided. In this paper, an adaptive zoning technique for efficient word spotting is introduced. The main idea is that the zoning features are extracted after cutting the query word in vertical zones, according to its length and pixel distribution along the horizontal axis, and adjusting these boundaries optimally with the corresponding zones in the candidate match-word using Dynamic Time Warping (DTW). This adjustment is performed by coupling every zone of the query word to the corresponding zone of each candidate match-word with the use of the corresponding warping matrix. This process absorbs the ambiguities between the query and the candidate match words and due to this fact it can be applied to both machine-printed and handwritten document images. The proposed word spotting technique is tested using the pixel density as a characteristic feature in every zone and an improvement is recorded compared to other state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125436124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6