2014 11th IAPR International Workshop on Document Analysis Systems最新文献

筛选
英文 中文
Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents 原生数字化文档页面布局分析的基础真实性与性能评价
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.37
Xin Tao, Zhi Tang, Canhui Xu, Liangcai Gao
{"title":"Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents","authors":"Xin Tao, Zhi Tang, Canhui Xu, Liangcai Gao","doi":"10.1109/DAS.2014.37","DOIUrl":"https://doi.org/10.1109/DAS.2014.37","url":null,"abstract":"In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125189137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Printer Identification Using Supervised Learning for Document Forgery Detection 使用监督学习进行文件伪造检测的打印机识别
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.48
Sarah Elkasrawi, F. Shafait
{"title":"Printer Identification Using Supervised Learning for Document Forgery Detection","authors":"Sarah Elkasrawi, F. Shafait","doi":"10.1109/DAS.2014.48","DOIUrl":"https://doi.org/10.1109/DAS.2014.48","url":null,"abstract":"Identifying the source printer of a document is important in forgery detection. The larger the number of documents to be investigated for forgery, the less time-efficient manual examination becomes. Assuming the document in question was scanned, the accuracy of automatic forgery detection depends on the scanning resolution. Low (100-200 dpi) and common (300-400 dpi) resolution scans have less distinctive features than high-end scanner resolution, whereas the former is more widespread in offices. In this paper, we propose a method to automatically identify source printers using common-resolution scans (400 dpi). Our method depends on distinctive noise produced by printers. Independent of the document content or size, each printer produces noise depending on its printing technique, brand and slight differences due to manufacturing imperfections. Experiments were carried out on a set of 400 documents of similar structure printed using 20 different printers. The documents were scanned at 400 dpi using the same scanner. Assuming constant settings of the printer, the overall accuracy of the classification was 76.75%.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129318694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Robustness Assessment of Texture Features for the Segmentation of Ancient Documents 纹理特征在古代文献分割中的鲁棒性评估
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.22
Maroua Mehri, V. C. Kieu, Mohamed Mhiri, P. Héroux, Petra Gomez-Krämer, M. Mahjoub, R. Mullot
{"title":"Robustness Assessment of Texture Features for the Segmentation of Ancient Documents","authors":"Maroua Mehri, V. C. Kieu, Mohamed Mhiri, P. Héroux, Petra Gomez-Krämer, M. Mahjoub, R. Mullot","doi":"10.1109/DAS.2014.22","DOIUrl":"https://doi.org/10.1109/DAS.2014.22","url":null,"abstract":"For the segmentation of ancient digitized document images, it has been shown that texture feature analysis is a consistent choice for meeting the need to segment a page layout under significant and various degradations. In addition, it has been proven that the texture-based approaches work effectively without hypothesis on the document structure, neither on the document model nor the typographical parameters. Thus, by investigating the use of texture as a tool for automatically segmenting images, we propose to search homogeneous and similar content regions by analyzing texture features based on a multiresolution analysis. The preliminary results show the effectiveness of the texture features extracted from the autocorrelation function, the Grey Level Co-occurrence Matrix (GLCM), and the Gabor filters. In order to assess the robustness of the proposed texture-based approaches, images under numerous degradation models are generated and two image enhancement algorithms (non-local means filtering and superpixel techniques) are evaluated by several accuracy metrics. This study shows the robustness of texture feature extraction for segmentation in the case of noise and the uselessness of a demising step.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126775938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An on-line platform for ground truthing and performance evaluation of text extraction systems 文本提取系统的地面真实性和性能评估在线平台
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.49
Dimosthenis Karatzas, Sergi Robles Mestre, L. G. I. Bigorda
{"title":"An on-line platform for ground truthing and performance evaluation of text extraction systems","authors":"Dimosthenis Karatzas, Sergi Robles Mestre, L. G. I. Bigorda","doi":"10.1109/DAS.2014.49","DOIUrl":"https://doi.org/10.1109/DAS.2014.49","url":null,"abstract":"This paper presents a set of on-line software tools for creating ground truth and calculating performance evaluation metrics for text extraction tasks such as localization, segmentation and recognition. The platform supports the definition of comprehensive ground truth information at different text representation levels while it offers centralised management and quality control of the ground truthing effort. It implements a range of state of the art performance evaluation algorithms and offers functionality for the definition of evaluation scenarios, on-line calculation of various performance metrics and visualisation of the results. The presented platform, which comprises the backbone of the ICDAR 2011 (challenge 1) and 2013 (challenges 1 and 2) Robust Reading competitions, is now made available for public use.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Empirical Evaluation of CRF-Based Bibliography Extraction from Reference Strings 基于crf的参考书目字符串提取的实证评价
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.64
Manabu Ohta, Daiki Arauchi, A. Takasu, J. Adachi
{"title":"Empirical Evaluation of CRF-Based Bibliography Extraction from Reference Strings","authors":"Manabu Ohta, Daiki Arauchi, A. Takasu, J. Adachi","doi":"10.1109/DAS.2014.64","DOIUrl":"https://doi.org/10.1109/DAS.2014.64","url":null,"abstract":"This paper reports an empirical evaluation of a CRF-based bibliography parser we have developed for reference strings of research papers. The parser uses a conditional random field (CRF) to estimate the correct bibliographic label such as an author's name and a title for each token in a reference string. We applied the parser specifically designed for reference strings to three academic journals, an English one and two Japanese ones, published in Japan. Experiments showed (i) the parser correctly parsed from 90% to 94% of reference strings depending on the kinds of journals used and (ii) segmentation errors induced by tokenization considerably degraded the final parsing accuracies. This paper also discusses some future directions of the bibliography extraction based on a detailed analysis of the experiments.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127129606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improving Classification of an Industrial Document Image Database by Combining Visual and Textual Features 结合视觉特征和文本特征改进工业文档图像数据库分类
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.44
Olivier Augereau, N. Journet, A. Vialard, J. Domenger
{"title":"Improving Classification of an Industrial Document Image Database by Combining Visual and Textual Features","authors":"Olivier Augereau, N. Journet, A. Vialard, J. Domenger","doi":"10.1109/DAS.2014.44","DOIUrl":"https://doi.org/10.1109/DAS.2014.44","url":null,"abstract":"The main contribution of this paper is a new method for classifying document images by combining textual features extracted with the Bag of Words (BoW) technique and visual features extracted with the Bag of Visual Words (BoVW) technique. The BoVW is widely used within the computer vision community for scene classification or object recognition but few applications for the classification of entire document images have been submitted. While previous attempts have been showing disappointing results by combining visual and textual features with the Borda-count technique, we're proposing here a combination through learning approach. Experiments conducted on a 1925 document image industrial database reveal that this fusion scheme significantly improves the classification performances. Our concluding contribution deals with the choosing and tuning of the BoW and/or BoVW techniques in an industrial context.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128526256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Text Detection Using Delaunay Triangulation in Video Sequence 基于Delaunay三角剖分的视频序列文本检测
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.28
Liang Wu, P. Shivakumara, Tong Lu, C. Tan
{"title":"Text Detection Using Delaunay Triangulation in Video Sequence","authors":"Liang Wu, P. Shivakumara, Tong Lu, C. Tan","doi":"10.1109/DAS.2014.28","DOIUrl":"https://doi.org/10.1109/DAS.2014.28","url":null,"abstract":"Text detection and tracking in video sequence is gaining interest due to the challenges posed by low resolution and complex background. This paper proposes a new method for text detection by estimating trajectories between the corners of texts in video sequence over time. Each trajectory is considered as one node to form a graph for all trajectories and Delaunay triangulation is used to obtain edges to connect nodes of the graph. In order to identify the edges that represent text regions, we propose four pruning criteria based on spatial proximity, motion coherence, local appearance and canny rate. This results in several sub-graphs. Then we use depth first search to collect corner points, which essentially represent text candidates. False positives are eliminated using heuristics and missing trajectories will be obtained by tracking the corners in temporal frames. We test the method on different videos and evaluate the method in terms of recall, precision, f-measure with existing results. Experimental result shows that the proposed method is superior to existing method.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130584183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
OCR Performance Prediction Using a Bag of Allographs and Support Vector Regression 使用异位图和支持向量回归的OCR性能预测
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.72
T. Bhowmik, T. Paquet, N. Ragot
{"title":"OCR Performance Prediction Using a Bag of Allographs and Support Vector Regression","authors":"T. Bhowmik, T. Paquet, N. Ragot","doi":"10.1109/DAS.2014.72","DOIUrl":"https://doi.org/10.1109/DAS.2014.72","url":null,"abstract":"In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133916175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Typed and Handwritten Text Block Segmentation System for Heterogeneous and Complex Documents 异构和复杂文档的打字和手写文本块分割系统
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.39
Philippine Barlas, Sébastien Adam, Clément Chatelain, T. Paquet
{"title":"A Typed and Handwritten Text Block Segmentation System for Heterogeneous and Complex Documents","authors":"Philippine Barlas, Sébastien Adam, Clément Chatelain, T. Paquet","doi":"10.1109/DAS.2014.39","DOIUrl":"https://doi.org/10.1109/DAS.2014.39","url":null,"abstract":"This paper presents a Document Image Analysis (DIA) system able to extract homogeneous typed and handwritten text regions from complex layout documents of various types. The method is based on two connected component classification stages that successively discriminate text/non text and typed/handwritten shapes, followed by an original block segmentation method based on white rectangles detection. We present the results obtained by the system during the first competition round of the MAURDOR campaign.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115344731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Graph Model Optimization Based Historical Chinese Character Segmentation Method 基于图模型优化的历史汉字分割方法
2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI: 10.1109/DAS.2014.57
Jingning Ji, Liangrui Peng, Bohan Li
{"title":"Graph Model Optimization Based Historical Chinese Character Segmentation Method","authors":"Jingning Ji, Liangrui Peng, Bohan Li","doi":"10.1109/DAS.2014.57","DOIUrl":"https://doi.org/10.1109/DAS.2014.57","url":null,"abstract":"Historical Chinese document recognition technology is important for digital library. However, historical Chinese character segmentation remains a difficult problem due to the complex structure of Chinese characters and various writing styles. This paper presents a novel method for historical Chinese character segmentation based on graph model. After a preliminary over-segmentation stage, the system applies a merging process. The candidate segmentation positions are denoted by the nodes of a graph, and the merging process is regarded as selecting an optimal path of the graph. The weight of edge in the graph is calculated by the cost function which considers geometric features and recognition confidence. Experimental results show that the proposed method is effective with a detection rate of 94.6% and an accuracy rate of 96.1% on a test set of practical historical Chinese document samples.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124639765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信