IEEE International Conference on Document Analysis and Recognition最新文献

筛选
英文 中文
Diffusion-based Document Layout Generation 基于扩散的文档布局生成
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-19 DOI: 10.48550/arXiv.2303.10787
Liu He, Yijuan Lu, John Corring, D. Florêncio, Cha Zhang
{"title":"Diffusion-based Document Layout Generation","authors":"Liu He, Yijuan Lu, John Corring, D. Florêncio, Cha Zhang","doi":"10.48550/arXiv.2303.10787","DOIUrl":"https://doi.org/10.48550/arXiv.2303.10787","url":null,"abstract":"We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the image domain in order to permit more complex and realistic layouts. We also introduce a new metric, Document Earth Mover's Distance (Doc-EMD). By considering similarity between heterogeneous categories document designs, we handle the shortcomings of prior document metrics that only evaluate the same category of layouts. Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets. Moreover, our metric is capable of differentiating documents better than previous metrics for specific cases.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130675066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset 一个大型多域孟加拉文文档布局分析数据集
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-09 DOI: 10.48550/arXiv.2303.05325
Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon, Syed Mobassir Hossen, Md. Nazmuddoha Ansary, Intesur Ahmed, Fazle Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, Marsia Haque Meghla, Md. Rezwanul Haque1, Sayma Sultana Chowdhury, Farig Sadeque, Tahsin Reasat, Ahmed Imtiaz Humayun, Asif Sushmit
{"title":"BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset","authors":"Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon, Syed Mobassir Hossen, Md. Nazmuddoha Ansary, Intesur Ahmed, Fazle Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, Marsia Haque Meghla, Md. Rezwanul Haque1, Sayma Sultana Chowdhury, Farig Sadeque, Tahsin Reasat, Ahmed Imtiaz Humayun, Asif Sushmit","doi":"10.48550/arXiv.2303.05325","DOIUrl":"https://doi.org/10.48550/arXiv.2303.05325","url":null,"abstract":"While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Aligning benchmark datasets for table structure recognition 为表结构识别调整基准数据集
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-01 DOI: 10.48550/arXiv.2303.00716
B. Smock, Rohith Pesala, Robin Abraham
{"title":"Aligning benchmark datasets for table structure recognition","authors":"B. Smock, Rohith Pesala, Robin Abraham","doi":"10.48550/arXiv.2303.00716","DOIUrl":"https://doi.org/10.48550/arXiv.2303.00716","url":null,"abstract":"Benchmark datasets for table structure recognition (TSR) must be carefully processed to ensure they are annotated consistently. However, even if a dataset's annotations are self-consistent, there may be significant inconsistency across datasets, which can harm the performance of models trained and evaluated on them. In this work, we show that aligning these benchmarks$unicode{x2014}$removing both errors and inconsistency between them$unicode{x2014}$improves model performance significantly. We demonstrate this through a data-centric approach where we adopt one model architecture, the Table Transformer (TATR), that we hold fixed throughout. Baseline exact match accuracy for TATR evaluated on the ICDAR-2013 benchmark is 65% when trained on PubTables-1M, 42% when trained on FinTabNet, and 69% combined. After reducing annotation mistakes and inter-dataset inconsistency, performance of TATR evaluated on ICDAR-2013 increases substantially to 75% when trained on PubTables-1M, 65% when trained on FinTabNet, and 81% combined. We show through ablations over the modification steps that canonicalization of the table annotations has a significantly positive effect on performance, while other choices balance necessary trade-offs that arise when deciding a benchmark dataset's final composition. Overall we believe our work has significant implications for benchmark design for TSR and potentially other tasks as well. Dataset processing and training code will be released at https://github.com/microsoft/table-transformer.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127661086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents 历史结构化文档中嵌套命名实体识别方法的基准研究
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-02-20 DOI: 10.48550/arXiv.2302.10204
Solenn Tual, N. Abadie, J. Chazalon, Bertrand Dum'enieu, Edwin Carlinet
{"title":"A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents","authors":"Solenn Tual, N. Abadie, J. Chazalon, Bertrand Dum'enieu, Edwin Carlinet","doi":"10.48550/arXiv.2302.10204","DOIUrl":"https://doi.org/10.48550/arXiv.2302.10204","url":null,"abstract":"Named Entity Recognition (NER) is a key step in the creation of structured data from digitised historical documents. Traditional NER approaches deal with flat named entities, whereas entities often are nested. For example, a postal address might contain a street name and a number. This work compares three nested NER approaches, including two state-of-the-art approaches using Transformer-based architectures. We introduce a new Transformer-based approach based on joint labelling and semantic weighting of errors, evaluated on a collection of 19 th-century Paris trade directories. We evaluate approaches regarding the impact of supervised fine-tuning, unsupervised pre-training with noisy texts, and variation of IOB tagging formats. Our results show that while nested NER approaches enable extracting structured data directly, they do not benefit from the extra knowledge provided during training and reach a performance similar to the base approach on flat entities. Even though all 3 approaches perform well in terms of F1 scores, joint labelling is most suitable for hierarchically structured data. Finally, our experiments reveal the superiority of the IO tagging format on such data.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DocILE Benchmark for Document Information Localization and Extraction 文档信息定位和提取的DocILE基准
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-02-11 DOI: 10.48550/arXiv.2302.05658
vStvep'an vSimsa, Milan vSulc, Michal Uvrivc'avr, Yash J. Patel, Ahmed Hamdi, Matvej Koci'an, Maty'avs Skalick'y, Jivr'i Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas
{"title":"DocILE Benchmark for Document Information Localization and Extraction","authors":"vStvep'an vSimsa, Milan vSulc, Michal Uvrivc'avr, Yash J. Patel, Ahmed Hamdi, Matvej Koci'an, Maty'avs Skalick'y, Jivr'i Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas","doi":"10.48550/arXiv.2302.05658","DOIUrl":"https://doi.org/10.48550/arXiv.2302.05658","url":null,"abstract":"This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific aspects, resulting in the following key features: (i) annotations in 55 classes, which surpasses the granularity of previously published key information extraction datasets by a large margin; (ii) Line Item Recognition represents a highly practical information extraction task, where key information has to be assigned to items in a table; (iii) documents come from numerous layouts and the test set includes zero- and few-shot cases as well as layouts commonly seen in the training set. The benchmark comes with several baselines, including RoBERTa, LayoutLMv3 and DETR-based Table Transformer; applied to both tasks of the DocILE benchmark, with results shared in this paper, offering a quick starting point for future work. The dataset, baselines and supplementary material are available at https://github.com/rossumai/docile.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132561487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On Web-based Visual Corpus Construction for Visual Document Understanding 基于web的可视化文档理解视觉语料库构建研究
IEEE International Conference on Document Analysis and Recognition Pub Date : 2022-11-07 DOI: 10.1007/978-3-031-41682-8_19
Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoon Kim, Geewook Kim
{"title":"On Web-based Visual Corpus Construction for Visual Document Understanding","authors":"Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoon Kim, Geewook Kim","doi":"10.1007/978-3-031-41682-8_19","DOIUrl":"https://doi.org/10.1007/978-3-031-41682-8_19","url":null,"abstract":"","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129659547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TPFNet: A Novel Text In-painting Transformer for Text Removal TPFNet:一种用于文本删除的新型文本绘制转换器
IEEE International Conference on Document Analysis and Recognition Pub Date : 2022-10-26 DOI: 10.48550/arXiv.2210.14461
Onkar Susladkar, Dhruv Makwana, Gayatri Deshmukh, Sparsh Mittal, R. S. Teja, Rekha Singhal
{"title":"TPFNet: A Novel Text In-painting Transformer for Text Removal","authors":"Onkar Susladkar, Dhruv Makwana, Gayatri Deshmukh, Sparsh Mittal, R. S. Teja, Rekha Singhal","doi":"10.48550/arXiv.2210.14461","DOIUrl":"https://doi.org/10.48550/arXiv.2210.14461","url":null,"abstract":"Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use\"pyramidal vision transformer\"(PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127772844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augraphy: A Data Augmentation Library for Document Images Augraphy:一个文档图像的数据增强库
IEEE International Conference on Document Analysis and Recognition Pub Date : 2022-08-30 DOI: 10.48550/arXiv.2208.14558
Samay Maini, Alexander Groleau, Kok Wei Chee, Stefan Larson, Jonathan Boarman
{"title":"Augraphy: A Data Augmentation Library for Document Images","authors":"Samay Maini, Alexander Groleau, Kok Wei Chee, Stefan Larson, Jonathan Boarman","doi":"10.48550/arXiv.2208.14558","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14558","url":null,"abstract":"This paper introduces Augraphy, a Python library for constructing data augmentation pipelines which produce distortions commonly seen in real-world document image datasets. Augraphy stands apart from other data augmentation tools by providing many different strategies to produce augmented versions of clean document images that appear as if they have been altered by standard office operations, such as printing, scanning, and faxing through old or dirty machines, degradation of ink over time, and handwritten markings. This paper discusses the Augraphy tool, and shows how it can be used both as a data augmentation tool for producing diverse training data for tasks such as document denoising, and also for generating challenging test data to evaluate model robustness on document image modeling tasks.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114672417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks 基于变压器的表单字段提取器在表单攻击下的鲁棒性评估
IEEE International Conference on Document Analysis and Recognition Pub Date : 2021-10-08 DOI: 10.1007/978-3-031-41679-8_10
Le Xue, M. Gao, Zeyuan Chen, Caiming Xiong, Ran Xu
{"title":"Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks","authors":"Le Xue, M. Gao, Zeyuan Chen, Caiming Xiong, Ran Xu","doi":"10.1007/978-3-031-41679-8_10","DOIUrl":"https://doi.org/10.1007/978-3-031-41679-8_10","url":null,"abstract":"","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125468412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ICDAR 2021 Competition on Scene Video Text Spotting ICDAR 2021场景视频文本识别竞赛
IEEE International Conference on Document Analysis and Recognition Pub Date : 2021-07-26 DOI: 10.1007/978-3-030-86337-1_43
Zhanzhan Cheng, Jing Lu, Baorui Zou, Shuigeng Zhou, Fei Wu
{"title":"ICDAR 2021 Competition on Scene Video Text Spotting","authors":"Zhanzhan Cheng, Jing Lu, Baorui Zou, Shuigeng Zhou, Fei Wu","doi":"10.1007/978-3-030-86337-1_43","DOIUrl":"https://doi.org/10.1007/978-3-030-86337-1_43","url":null,"abstract":"","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130975656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信