2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献_第7页

Improved Localization Accuracy by LocNet for Faster R-CNN Based Text Detection 基于R-CNN文本检测的LocNet提高定位精度

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.155

Zhuoyao Zhong, Lei Sun, Qiang Huo

引用次数: 43

Fully Convolutional Neural Networks for Newspaper Article Segmentation 报纸文章分割的全卷积神经网络

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.75

B. Meier, Thilo Stadelmann, Jan Stampfli, M. Arnold, Mark Cieliebak

引用次数: 31

Temporal Integration for Word-Wise Caption and Scene Text Identification 基于时序集成的文字描述和场景文本识别

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.65

Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab

{"title":"Temporal Integration for Word-Wise Caption and Scene Text Identification","authors":"Sangheeta Roy, P. Shivakumara, U. Pal, Tong Lu, A. W. Wahab","doi":"10.1109/ICDAR.2017.65","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.65","url":null,"abstract":"Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121314043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Rank-Reducing Two-Dimensional Grammars for Document Layout Analysis 用于文档布局分析的降阶二维语法

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.185

D. Prusa, Akio Fujiyoshi

引用次数: 5

1990 US Census Form Recognition Using CTC Network, WFST Language Model, and Surname Correction 使用CTC网络、WFST语言模型和姓氏校正的1990年美国人口普查表格识别

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.163

Huaigu Cao, Stephen Rawls, P. Natarajan

引用次数: 2

Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification 用于离线写作者识别的归一化局部Naïve Bayes近邻分类器

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.168

H. Mohammed, V. Märgner, T. Konidaris, H. Siegfried Stiehl

引用次数: 25

Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection 语义页面分割和表检测的多尺度多任务FCN

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.50

Dafang He, Scott D. Cohen, Brian L. Price, Daniel Kifer, C. Lee Giles

{"title":"Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection","authors":"Dafang He, Scott D. Cohen, Brian L. Price, Daniel Kifer, C. Lee Giles","doi":"10.1109/ICDAR.2017.50","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.50","url":null,"abstract":"Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level \"edges\" around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115965189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

A Font Setting Based Bayesian Model to Extract Mathematical Expression in PDF Files 基于字体设置的贝叶斯模型提取PDF文件中的数学表达式

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.129

Xing Wang, Jyh-Charn S. Liu

{"title":"A Font Setting Based Bayesian Model to Extract Mathematical Expression in PDF Files","authors":"Xing Wang, Jyh-Charn S. Liu","doi":"10.1109/ICDAR.2017.129","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.129","url":null,"abstract":"This paper proposes a Font Setting based Bayesian (FSB) model to extract mathematical expressions (MEs) in the portable document format (PDF) files. The FSB model is a self-adaptive unsupervised algorithm which first uses rules to identify ME and non-ME (NME) and then extracts the remaining ME using the Bayesian inference based on the observation that MEs tend to repeatedly represented in a particular style. PDF files are first processed using a PDF parser and document layout is analyzed using projection profiling cutting based algorithm to detect columns and lines. Heuristic rules derived from the knowledge of math usage and writing practices are employed to reason about the posterior probability of a char being ME vs. NME, conditional upon the font and value information. Based on the char level posterior probability, Bayesian inference is used to infer a none-separable character set (NSCS) being ME or not. Consecutive (fragmented) ME NSCS are merged to produce final results. Experimental results show that our approach achieves 0.006 (0.135) false rate and 0.111/0.093 miss rate for IME (EME) extraction. As for NSCS classification, our approach achieves 93.1% precision, 90.5% recall rate, and F1 score of 0.918. The processing time is markedly shorter than supervised machine learning techniques, and the extracted information and analytics products can be used for high level applications.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125624115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Handwriting Style Mixture Adaptation 笔迹风格混合适应

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.166

Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu

{"title":"Handwriting Style Mixture Adaptation","authors":"Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2017.166","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.166","url":null,"abstract":"In handwriting recognition, the test data usually come from multiple writers which are not shown in the training data. Therefore, adapting the base classifier towards the new style of each writer can significantly improve the generalization performance. Traditional writer adaptation methods usually assume that there is only one writer (one style) in the test data, and we call this situation as style-clear adaptation. However, a more common situation is that multiple handwriting styles exist in the test data, which is widely appeared in multi-font documents and handwriting data produced by the cooperation of multiple writers. We call the adaptation in this situation as style-mixture adaptation. To deal with this problem, in this paper, we propose a novel method called K-style mixture adaptation (K-SMA) with the assumption that there are totally K styles in the test data. Specifically, we first partition the test data into K groups (style clustering) according to their style consistency, which is measured by a newly designed style feature that can eliminate class (category) information and keep handwriting style information. After that, in each group, a style transfer mapping (STM) is used for writer adaptation. Since the initial style clustering may be not reliable, we repeat this process iteratively to improve the adaptation performance. The K-SMA model is fully unsupervised which do not require either the class label or the style index. Moreover, the K-SMA model can be effectively combined with the benchmark convolutional neural network (CNN) models. Experiments on the online Chinese handwriting database CASIA-OLHWDB demonstrate that K-SMA is an efficient and effective solution for style-mixture adaptation.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Capturing Handwritten Ink Strokes with a Fast Video Camera 用快速摄像机捕捉手写墨水笔画

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.209

Chelhwon Kim, Patrick Chiu, H. Oda

引用次数: 1