2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
Identifying the Central Figure of a Scientific Paper 识别科学论文的中心图形
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00173
Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe
{"title":"Identifying the Central Figure of a Scientific Paper","authors":"Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe","doi":"10.1109/ICDAR.2019.00173","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00173","url":null,"abstract":"Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a \"central figure\" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125237689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection 双图标签正则化减少内联数学表达式检测的过度分割
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00069
Xing Wang, Zelun Wang, Jyh-Charn S. Liu
{"title":"Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection","authors":"Xing Wang, Zelun Wang, Jyh-Charn S. Liu","doi":"10.1109/ICDAR.2019.00069","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00069","url":null,"abstract":"Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the unrestricted usage of font styles and blurred boundaries with plaintext in scientific publications. For instance, many inline MEs detected by existing algorithms are split into multiple parts incorrectly due to the misidentification of a few characters. In this paper, we propose a bigram regularization model to resolve the split problem in inline ME detection. The model incorporates neighboring constraints during labeling of ME vs. plaintext. Experimental results show that this technique significantly reduces the splits of inline MEs, with small gains in the false and miss rate. In comparison with a CRF model, our model achieves a higher F1 score and a lower miss rate.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116629209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Table Detection in Invoice Documents by Graph Neural Networks 基于图神经网络的发票文件表检测
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00028
Pau Riba, Anjan Dutta, Lutz Goldmann, A. Fornés, O. R. Terrades, J. Lladós
{"title":"Table Detection in Invoice Documents by Graph Neural Networks","authors":"Pau Riba, Anjan Dutta, Lutz Goldmann, A. Fornés, O. R. Terrades, J. Lladós","doi":"10.1109/ICDAR.2019.00028","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00028","url":null,"abstract":"Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical or quantitative relationships among pieces of information. In digital mail room applications, where a large amount of administrative documents must be processed with reasonable accuracy, the detection and interpretation of tables is crucial. Table recognition has gained interest in document image analysis, in particular in unconstrained formats (absence of rule lines, unknown information of rows and columns). In this work, we propose a graph-based approach for detecting tables in document images. Instead of using the raw content (recognized text), we make use of the location, context and content type, thus it is purely a structure perception approach, not dependent on the language and the quality of the text reading. Our framework makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model has been experimentally validated in two invoice datasets and achieved encouraging results. Additionally, due to the scarcity of benchmark datasets for this task, we have contributed to the community a novel dataset derived from the RVL-CDIP invoice data. It will be publicly released to facilitate future research.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117137500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
CNN-BLSTM-CRF Network for Semantic Labeling of Students' Online Handwritten Assignments CNN-BLSTM-CRF网络对学生在线手写作业的语义标注
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00169
Amirali Darvishzadeh, T. Stahovich, Amir H. Feghahati, Negin Entezari, Shaghayegh Gharghabi, Reed Kanemaru, C. Shelton
{"title":"CNN-BLSTM-CRF Network for Semantic Labeling of Students' Online Handwritten Assignments","authors":"Amirali Darvishzadeh, T. Stahovich, Amir H. Feghahati, Negin Entezari, Shaghayegh Gharghabi, Reed Kanemaru, C. Shelton","doi":"10.1109/ICDAR.2019.00169","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00169","url":null,"abstract":"Automatic semantic labeling of strokes in online handwritten documents is a crucial task for many applications such as diagram interpretation, text recognition, and search. We formulate this task as a stroke classification problem in which each stroke is classified as a cross-out, free body diagram, or text. Separating free body diagram and text in this work is different than the traditional text/non-text separation problem because these two classes contain both text and graphics. The text class includes textual notes, mathematical symbols/equations, and graphics such as arrows that connect other elements. The free body diagram class also contains graphics and various alphanumeric characters and symbols that mark or explain the graphical objects. In this work, we present a novel deep neural network model for classification of strokes in online handwritten documents. There are two input sequences to the network. The first sequence contains the trajectories of the pen strokes while the second contains features of the strokes. Each of these sequences is fed to its own CNN-BLSTM channel to extract features and encode relationships between nearby strokes. The output of the two channels is concatenated and used as the input to a CRF layer that predicts the best sequence of labels for given input sequences. We evaluated our model on a dataset of 1,060 pages written by 132 students in an undergraduate statics course. Our model achieved an overall classification accuracy of 94.70% on this dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127251153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets 破解破碎的平板电脑上的代码:标准化2D和3D数据集中注释楔形文字的学习挑战
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00032
H. Mara, B. Bogacz
{"title":"Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets","authors":"H. Mara, B. Bogacz","doi":"10.1109/ICDAR.2019.00032","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00032","url":null,"abstract":"The number of known cuneiform tablets is assumed to be in the hundreds of thousands. The Hilprecht Archive Online contains 1977 high-resolution 3D scans of tablets. The online cuneiform database CDLI catalogs metadata for more than 100.000 tablets. While both are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remain elusive. The data is only accessible by searching web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We pave the way for large-scale analyses of cuneiform tablets by assembling a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind and of its size in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis, at https://doi.org/10.11588/data/IE8CCN.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127436903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Robust Data Hiding Scheme Using Generated Content for Securing Genuine Documents 利用生成内容保护正版文档的健壮数据隐藏方案
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00131
Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu
{"title":"A Robust Data Hiding Scheme Using Generated Content for Securing Genuine Documents","authors":"Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00131","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00131","url":null,"abstract":"Data hiding is an effective technique, compared to pervasive black-and-white code patterns such as barcode and quick response code, which can be used to secure document images against forgery or unauthorized intervention. In this work, we propose a robust digital watermarking scheme for securing genuine documents by leveraging generative adversarial networks (GAN). To begin with, the input document is adjusted to its right form by geometric correction. Next, the generated document is obtained from the input document by using the mentioned networks, and it is regarded as a reference for data hiding and detection. We then introduce an algorithm that hides a secret information into the document and produces a watermarked document whose content is minimally distorted in terms of normal observation. Furthermore, we also present a method that detects the hidden data from the watermarked document by measuring the distance of pixel values between the generated and watermarked document. For improving the security feature, we encode the secret information prior to hiding it by using pseudo random numbers. Lastly, we demonstrate that our approach gives high precision of data detection, and competitive performance compared to state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
OBC306: A Large-Scale Oracle Bone Character Recognition Dataset 一个大规模的甲骨文字符识别数据集
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00114
Shuangping Huang, Haobin Wang, Yong-ge Liu, Xiaosong Shi, Lianwen Jin
{"title":"OBC306: A Large-Scale Oracle Bone Character Recognition Dataset","authors":"Shuangping Huang, Haobin Wang, Yong-ge Liu, Xiaosong Shi, Lianwen Jin","doi":"10.1109/ICDAR.2019.00114","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00114","url":null,"abstract":"The oracle bone script from ancient China is among the world's most famous ancient writing systems. Identifying and deciphering oracle bone scripts is one of the most important topics in oracle bone study and requires a deep familiarity with the culture of ancient China. This task remains very challenging for two reasons. The first is that it is executed mainly by humans and requires a high level of experience, aptitude, and commitment. The second is due to the scarcity of domain-specific data, which hinders the advancement of automatic recognition research. A collection of well-labeled oracle-bone data is necessary to bridge the oracle bone and information processing fields; however, such a dataset has not yet been presented. Hence, in this paper, we construct a new large-scale dataset of oracle bone characters called OBC306. We also present the standard deep convolutional neural network-based evaluation for this dataset to serve as a benchmark. Through statistical and visual analyses, we describe the inherent difficulties of oracle bone recognition and propose future challenges for and extensions of oracle bone study using information processing. This dataset contains more than 300,000 character-level samples cropped from oracle-bone rubbings or images. It covers 306 glyph classes and is the largest existing raw oracle-bone character set, to the best of our knowledge. It is anticipated the publication of this dataset will facilitate the development of oracle bone research and lead to optimal algorithmic solutions.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123314166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Logo Design Analysis by Ranking 从排名分析标志设计
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00238
Takuro Karamatsu, D. Suehiro, S. Uchida
{"title":"Logo Design Analysis by Ranking","authors":"Takuro Karamatsu, D. Suehiro, S. Uchida","doi":"10.1109/ICDAR.2019.00238","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00238","url":null,"abstract":"In this paper, we analyze logo designs by using machine learning, as a promising trial of graphic design analysis. Specifically, we will focus on favicon images, which are tiny logos used as company icons on web browsers, and analyze them to understand their trends in individual industry classes. For example, if we can catch the subtle trends in favicons of financial companies, they will suggest to us how professional designers express the atmosphere of financial companies graphically. For the purpose, we will use top-rank learning, which is one of the recent machine learning methods for ranking and very suitable for revealing the subtle trends in graphic designs.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"397 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Target-Directed MixUp for Labeling Tangut Characters 用于标注切线字符的目标定向混淆
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00041
Guangwei Zhang, Yinliang Zhao
{"title":"Target-Directed MixUp for Labeling Tangut Characters","authors":"Guangwei Zhang, Yinliang Zhao","doi":"10.1109/ICDAR.2019.00041","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00041","url":null,"abstract":"Deep learning largely improves the performance in computer vision and image understanding tasks depending on large training datasets of labeled images. However, it is usually expensive and time-consuming to label data although unlabeled data are much easier to get. It is practical to build the training dataset iteratively from a small set of manually labeled data because of the limited budget or emerging new categories. The labeled data could not only be used for training the model but also some knowledge could be mined from them for finding examples of the classes not included in the training dataset. Mixup [1] improves the model's accuracy and generalization by augmenting the training dataset with the \"virtual examples\" that are generated by mixing pairs of randomly selected examples from the training dataset. Motivated by Mixup, we propose the Target-Directed Mixup (TDM) method for building the training dataset of the deep learning-based Tangut character recognition system. The virtual examples are generated by mixing two or more similar examples in the training dataset, together with the target examples of unseen classes that need to be labeled, which is a kind of generative few-shot learning. This method can help expand the training dataset by finding real examples of unseen Tangut characters and provide virtual examples that could represent the rare characters that are used very limited in historical documents. According to our experiments, TDM can help recognize the unseen examples at the accuracy of 80% with only 4 to 5 real target examples, which largely reduces human labor in data annotation.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123783300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graphical Object Detection in Document Images 文档图像中的图形对象检测
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00018
Ranajit Saha, Ajoy Mondal, C. V. Jawahar
{"title":"Graphical Object Detection in Document Images","authors":"Ranajit Saha, Ajoy Mondal, C. V. Jawahar","doi":"10.1109/ICDAR.2019.00018","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00018","url":null,"abstract":"Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection ( GOD ). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR -2013, ICDAR - POD2017 and UNLV shows that our model yields promising results as compared to state-of-the-art techniques.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116464450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信