2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
A Man-Machine Cooperating System Based on the Generalized Reject Model 基于广义拒绝模型的人机协作系统
Shunichi Kimura, E. Tanaka, Masanori Sekino, Takuya Sakurai, Satoshi Kubota, Ikken So, Y. Koshi
{"title":"A Man-Machine Cooperating System Based on the Generalized Reject Model","authors":"Shunichi Kimura, E. Tanaka, Masanori Sekino, Takuya Sakurai, Satoshi Kubota, Ikken So, Y. Koshi","doi":"10.1109/ICDAR.2017.218","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.218","url":null,"abstract":"In recognition systems, reject options are usually introduced to reduce error rates for general classifiers. For taking this option, there is a trade-off relationship between error rates and reject rates and it is required to optimize the trade-off. Conventional methods have implicit assumptions that the error rates are zero after the rejection; however, real systems have their own error rates even after the rejection. In this paper, we propose a generalized reject model that can introduce error rates after rejection. This model can handle variety of systems with plural classifiers and thresholds. Also, we can optimize the error-reject trade-off by defining and minimizing a cost function of the model. Finally, experimental results show effectiveness of the proposed model by applying it to data entry systems.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114955867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DANIEL: A Deep Architecture for Automatic Analysis and Retrieval of Building Floor Plans DANIEL:用于自动分析和检索建筑平面图的深度架构
Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta
{"title":"DANIEL: A Deep Architecture for Automatic Analysis and Retrieval of Building Floor Plans","authors":"Divya Sharma, Nitin Gupta, C. Chattopadhyay, S. Mehta","doi":"10.1109/ICDAR.2017.76","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.76","url":null,"abstract":"Automatically finding out existing building layouts from a repository is always helpful for an architect to ensure reuse of design and timely completion of projects. In this paper, we propose Deep Architecture for fiNdIng alikE Layouts (DANIEL). Using DANIEL, an architect can search from the existing projects repository of layouts (floor plan), and give accurate recommendation to the buyers. DANIEL is also capable of recommending the property buyers, having a floor plan image, the corresponding rank ordered list of alike layouts. DANIEL is based on the deep learning paradigm to extract both low and high level semantic features from a layout image. The key contributions in the proposed approach are: (i) novel deep learning framework to retrieve similar floor plan layouts from repository; (ii) analysing the effect of individual deep convolutional neural network layers for floor plan retrieval task; and (iii) creation of a new complex dataset ROBIN (Repository Of BuildIng plaNs), having three broad dataset categories with 510 real world floor plans.We have evaluated DANIEL by performing extensive experiments on ROBIN and compared our results with eight different state-of-the-art methods to demonstrate DANIEL’s effectiveness on challenging scenarios.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122878364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Early Recognition of Handwritten Gestures Based on Multi-Classifier Reject Option 基于多分类器拒绝选项的手写手势早期识别
Zhaoxin Chen, É. Anquetil, C. Viard-Gaudin, H. Mouchère
{"title":"Early Recognition of Handwritten Gestures Based on Multi-Classifier Reject Option","authors":"Zhaoxin Chen, É. Anquetil, C. Viard-Gaudin, H. Mouchère","doi":"10.1109/ICDAR.2017.43","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.43","url":null,"abstract":"In this paper a multi-classifier method for early recognition of handwritten gesture is presented. Unlike the other works which study the early recognition problem related to the time, we propose to make the recognition according to the quantity of incremental drawing of handwritten gestures. We train a segment length based multi-classifier for the task of recognizing the handwritten touch gesture as early as possible. To deal with potential similar parts at the beginning of different gestures, we introduce a reject option to postpone the decision until ambiguity persists. We report results on two freely available datasets: MGSet and ILG. These results demonstrate the improvement we obtained by using the proposed reject option for the early recognition of handwritten gestures.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124189081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Convolutional Neural Networks for Figure Extraction in Historical Technical Documents 历史技术文档中图形提取的卷积神经网络
Chun-Nam Yu, Caleb C. Levy, I. Saniee
{"title":"Convolutional Neural Networks for Figure Extraction in Historical Technical Documents","authors":"Chun-Nam Yu, Caleb C. Levy, I. Saniee","doi":"10.1109/ICDAR.2017.134","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.134","url":null,"abstract":"We present a method of extracting figures and images from the pages of scanned documents, especially from technical research articles. Our approach is novel in two key ways. First, we treat this as a computer vision problem, and train convolutional neural networks to recognize figures in scanned pages. Second, we generate our training data from 'born-digital' structured documents, allowing us to automatically produce labels for our training set using PDF figure extractors. This avoids the otherwise tedious task of hand-labelling thousands of document pages. Our convolutional neural networks achieve precision and recall of close to 85% in identifying figures from a test set consisting of modern journal papers and conference proceedings, and obtain precision and recall above 80% on an application data set comprised of historical technical documents scanned from the Bell Labs Records. Our results show that models trained on digital documents transfer very well to historical scans. Finally, it is easy to extend our models to identify other document elements such as tables and captions.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125381432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Localizing and Recognizing Labels for Multi-Panel Figures in Biomedical Journals 生物医学期刊中多版面图形标签的定位与识别
Jie Zou, Sameer Kiran Antani, G. Thoma
{"title":"Localizing and Recognizing Labels for Multi-Panel Figures in Biomedical Journals","authors":"Jie Zou, Sameer Kiran Antani, G. Thoma","doi":"10.1109/ICDAR.2017.128","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.128","url":null,"abstract":"Multi-panel figures are common in biomedical journals. Often the subpanels are of different types, e.g. x-ray, microscopy, sketch, etc. Visual information retrieval of such figures can significantly benefit from Panel Label Recognition techniques that index figures for search engines, image content tagging, and correlating with figure (sub)captions. It is a challenging task due to large variation in the label locations, sizes, contrast to background, etc. In this work, we propose a 3-stage recognition algorithm. The first stage is formulated as object detection, where we extract Histograms of Oriented Gradient (HOG) features and train a linear Support Vector Machine (SVM) classifier. Label candidates are detected using sliding windows at different locations and scales. We also trained a convolutional deep neural network (CNN) to remove false positives. The second stage is formulated as image classification. We trained a 50-class RBF SVM classifier and estimate the posterior probabilities of each candidate label. The last stage is formulated as sequence classification. We used a beam search algorithm on the posterior probabilities estimated in the second stage along with a set of label sequence constraints to select an optimal label sequence. The algorithm is trained on 9,642 figures, and evaluated on the remaining 1,000 figures shows that the proposed algorithm achieves good precision and recall.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129823379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Binarizing Document Images Acquired with Portable Cameras 便携式相机获取的文件图像二值化
R. Lins, R. Bernardino, D. Jesus, José Mário Oliveira
{"title":"Binarizing Document Images Acquired with Portable Cameras","authors":"R. Lins, R. Bernardino, D. Jesus, José Mário Oliveira","doi":"10.1109/ICDAR.2017.348","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.348","url":null,"abstract":"Although made for \"family photos\" portable digital cameras, either in standalone models or embedded in cell phones, are often used to take photos of documents today. In general, such photos are sent via networks and either visualized in desktops, printed, or even transcribed via OCR. Binarization may play an important role in such a scheme. This paper follows the idea that \"no binarization algorithm is good for all kinds of images\". Non-uniform illumination, the possible interference of light sources from the environment, and non-uniform resolution are some of the problems found in photographed document images that are not present in their scanned counterparts. This paper presents a new methodology to assess binarization algorithms in different devices, taking into account the difficulties listed and the particularities of the cameras and documents.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129752911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Rectangle Mining Method for Understanding the Semantics of Financial Tables 一种理解财务表语义的矩形挖掘方法
Xilun Chen, Laura Chiticariu, Marina Danilevsky, A. Evfimievski, P. Sen
{"title":"A Rectangle Mining Method for Understanding the Semantics of Financial Tables","authors":"Xilun Chen, Laura Chiticariu, Marina Danilevsky, A. Evfimievski, P. Sen","doi":"10.1109/ICDAR.2017.52","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.52","url":null,"abstract":"Financial statements report crucial information in tables with complex semantic structure, which are desirable, yet challenging, to interpret automatically. For example, in such tables a row of data cells is often explained by the headers of other rows. In a departure from prior art, we propose a rectangle mining framework for understanding complex tables, which considers rectangular regions rather than individual cells or pairs of cells in a table. We instantiate this framework with ReMine, an algorithm for extracting row header semantics of table, and show that it significantly outperforms prior pair-wise classification approaches on two datasets: (i) a set of manually labeled financial tables from multiple companies, and (ii) the ICDAR 2013 Table Competition dataset.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121258201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Music Document Layout Analysis through Machine Learning and Human Feedback 通过机器学习和人类反馈分析音乐文档布局
Jorge Calvo-Zaragoza, Kecheng Zhang, Z. Saleh, Gabriel Vigliensoni, Ichiro Fujinaga
{"title":"Music Document Layout Analysis through Machine Learning and Human Feedback","authors":"Jorge Calvo-Zaragoza, Kecheng Zhang, Z. Saleh, Gabriel Vigliensoni, Ichiro Fujinaga","doi":"10.1109/ICDAR.2017.259","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.259","url":null,"abstract":"Music documents often include musical symbols as well as other relevant elements such as staff lines, text, and decorations. To detect and separate these constituent elements, we propose a layout analysis framework based on machine learning that focuses on pixel-level classification of the image. For that, we make use of supervised learning classifiers trained to infer the category of each pixel. In addition, our scenario considers a human-aided computing approach in which the user is part of the recognition loop, providing feedback where relevant errors are made.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116533306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
GMU: A Novel RNN Neuron and Its Application to Handwriting Recognition 一种新的RNN神经元及其在手写识别中的应用
Li Sun, Tonghua Su, Shengjie Zhou, Lijun Yu
{"title":"GMU: A Novel RNN Neuron and Its Application to Handwriting Recognition","authors":"Li Sun, Tonghua Su, Shengjie Zhou, Lijun Yu","doi":"10.1109/ICDAR.2017.176","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.176","url":null,"abstract":"Recurrent neural networks (RNNs) have been widely used in many sequential labeling fields. Decades of research fruits show that artificial neuron as the building blocks plays great role in its success. Different RNN neurons are proposed, such as long-short term memory (LSTM) and gated recurrent unit (GRU), and used in most applications let alone character recognition, to encode the long-term contextual dependencies. Inspired by both LSTM and GRU, a new structure named gated memory unit (GMU) is presented which carries forward their merits. GMU preserves the constant error carousels (CEC) which is devoted to enhance a smooth information flow. GMU also lends both the cell structure of LSTM and the interpolation gates of GRU. The proposed neuron is evaluated on both online English handwriting recognition and online Chinese handwriting recognition tasks in terms of parameter volumes, convergence and accuracy. The results show that GMU is of potential choice in handwriting recognition tasks.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126864668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Comparative Study on Optical Modeling Units for Off-Line Arabic Text Recognition 阿拉伯语离线文本识别光学建模单元的比较研究
Mohamed Benzeghiba
{"title":"A Comparative Study on Optical Modeling Units for Off-Line Arabic Text Recognition","authors":"Mohamed Benzeghiba","doi":"10.1109/ICDAR.2017.170","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.170","url":null,"abstract":"The role of the optical model in a text recognition system is to model the textual information written in image documents. This paper compares the performance of four Arabic optical modeling units in a Multi-Dimensional Long Short-Term Memory based state-of-the-art Arabic text recognition system. These units are: 1) The isolated characters, 2) Extended isolated characters with the different shapes of Lam-Alef (), 3) The character shapes within their contexts and, 4) The recently proposed sub-character units that allow sharing similar patterns in the different character shapes. Experiments are conducted on six tasks using Maurdor and Khatt databases. For a fair comparison, optical models are trained from scratch. The decoding is performed using 1) the predictions of the optical model only and, 2) combined with a 3-gram hybrid word/part-of-Arabic word language model. Results in terms of Word Error Rate show that best results are generally obtained with systems using isolated characters as the basic modeling units, although differences in the performance among different systems are negligible.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126943698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信