IEEE International Conference on Document Analysis and Recognition最新文献_第4页

Diffusion-based Document Layout Generation 基于扩散的文档布局生成

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-19 DOI: 10.48550/arXiv.2303.10787

Liu He, Yijuan Lu, John Corring, D. Florêncio, Cha Zhang

引用次数: 3

BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset 一个大型多域孟加拉文文档布局分析数据集

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-09 DOI: 10.48550/arXiv.2303.05325

Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon, Syed Mobassir Hossen, Md. Nazmuddoha Ansary, Intesur Ahmed, Fazle Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, Marsia Haque Meghla, Md. Rezwanul Haque1, Sayma Sultana Chowdhury, Farig Sadeque, Tahsin Reasat, Ahmed Imtiaz Humayun, Asif Sushmit

{"title":"BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset","authors":"Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon, Syed Mobassir Hossen, Md. Nazmuddoha Ansary, Intesur Ahmed, Fazle Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, Marsia Haque Meghla, Md. Rezwanul Haque1, Sayma Sultana Chowdhury, Farig Sadeque, Tahsin Reasat, Ahmed Imtiaz Humayun, Asif Sushmit","doi":"10.48550/arXiv.2303.05325","DOIUrl":"https://doi.org/10.48550/arXiv.2303.05325","url":null,"abstract":"While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Aligning benchmark datasets for table structure recognition 为表结构识别调整基准数据集

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-01 DOI: 10.48550/arXiv.2303.00716

B. Smock, Rohith Pesala, Robin Abraham

{"title":"Aligning benchmark datasets for table structure recognition","authors":"B. Smock, Rohith Pesala, Robin Abraham","doi":"10.48550/arXiv.2303.00716","DOIUrl":"https://doi.org/10.48550/arXiv.2303.00716","url":null,"abstract":"Benchmark datasets for table structure recognition (TSR) must be carefully processed to ensure they are annotated consistently. However, even if a dataset's annotations are self-consistent, there may be significant inconsistency across datasets, which can harm the performance of models trained and evaluated on them. In this work, we show that aligning these benchmarks$unicode{x2014}$removing both errors and inconsistency between them$unicode{x2014}$improves model performance significantly. We demonstrate this through a data-centric approach where we adopt one model architecture, the Table Transformer (TATR), that we hold fixed throughout. Baseline exact match accuracy for TATR evaluated on the ICDAR-2013 benchmark is 65% when trained on PubTables-1M, 42% when trained on FinTabNet, and 69% combined. After reducing annotation mistakes and inter-dataset inconsistency, performance of TATR evaluated on ICDAR-2013 increases substantially to 75% when trained on PubTables-1M, 65% when trained on FinTabNet, and 81% combined. We show through ablations over the modification steps that canonicalization of the table annotations has a significantly positive effect on performance, while other choices balance necessary trade-offs that arise when deciding a benchmark dataset's final composition. Overall we believe our work has significant implications for benchmark design for TSR and potentially other tasks as well. Dataset processing and training code will be released at https://github.com/microsoft/table-transformer.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127661086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents 历史结构化文档中嵌套命名实体识别方法的基准研究

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-02-20 DOI: 10.48550/arXiv.2302.10204

Solenn Tual, N. Abadie, J. Chazalon, Bertrand Dum'enieu, Edwin Carlinet

{"title":"A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents","authors":"Solenn Tual, N. Abadie, J. Chazalon, Bertrand Dum'enieu, Edwin Carlinet","doi":"10.48550/arXiv.2302.10204","DOIUrl":"https://doi.org/10.48550/arXiv.2302.10204","url":null,"abstract":"Named Entity Recognition (NER) is a key step in the creation of structured data from digitised historical documents. Traditional NER approaches deal with flat named entities, whereas entities often are nested. For example, a postal address might contain a street name and a number. This work compares three nested NER approaches, including two state-of-the-art approaches using Transformer-based architectures. We introduce a new Transformer-based approach based on joint labelling and semantic weighting of errors, evaluated on a collection of 19 th-century Paris trade directories. We evaluate approaches regarding the impact of supervised fine-tuning, unsupervised pre-training with noisy texts, and variation of IOB tagging formats. Our results show that while nested NER approaches enable extracting structured data directly, they do not benefit from the extra knowledge provided during training and reach a performance similar to the base approach on flat entities. Even though all 3 approaches perform well in terms of F1 scores, joint labelling is most suitable for hierarchically structured data. Finally, our experiments reveal the superiority of the IO tagging format on such data.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DocILE Benchmark for Document Information Localization and Extraction 文档信息定位和提取的DocILE基准

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-02-11 DOI: 10.48550/arXiv.2302.05658

vStvep'an vSimsa, Milan vSulc, Michal Uvrivc'avr, Yash J. Patel, Ahmed Hamdi, Matvej Koci'an, Maty'avs Skalick'y, Jivr'i Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas

{"title":"DocILE Benchmark for Document Information Localization and Extraction","authors":"vStvep'an vSimsa, Milan vSulc, Michal Uvrivc'avr, Yash J. Patel, Ahmed Hamdi, Matvej Koci'an, Maty'avs Skalick'y, Jivr'i Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas","doi":"10.48550/arXiv.2302.05658","DOIUrl":"https://doi.org/10.48550/arXiv.2302.05658","url":null,"abstract":"This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific aspects, resulting in the following key features: (i) annotations in 55 classes, which surpasses the granularity of previously published key information extraction datasets by a large margin; (ii) Line Item Recognition represents a highly practical information extraction task, where key information has to be assigned to items in a table; (iii) documents come from numerous layouts and the test set includes zero- and few-shot cases as well as layouts commonly seen in the training set. The benchmark comes with several baselines, including RoBERTa, LayoutLMv3 and DETR-based Table Transformer; applied to both tasks of the DocILE benchmark, with results shared in this paper, offering a quick starting point for future work. The dataset, baselines and supplementary material are available at https://github.com/rossumai/docile.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132561487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

On Web-based Visual Corpus Construction for Visual Document Understanding 基于web的可视化文档理解视觉语料库构建研究

IEEE International Conference on Document Analysis and Recognition Pub Date : 2022-11-07 DOI: 10.1007/978-3-031-41682-8_19

Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoon Kim, Geewook Kim

引用次数: 2

TPFNet: A Novel Text In-painting Transformer for Text Removal TPFNet:一种用于文本删除的新型文本绘制转换器

IEEE International Conference on Document Analysis and Recognition Pub Date : 2022-10-26 DOI: 10.48550/arXiv.2210.14461

Onkar Susladkar, Dhruv Makwana, Gayatri Deshmukh, Sparsh Mittal, R. S. Teja, Rekha Singhal

{"title":"TPFNet: A Novel Text In-painting Transformer for Text Removal","authors":"Onkar Susladkar, Dhruv Makwana, Gayatri Deshmukh, Sparsh Mittal, R. S. Teja, Rekha Singhal","doi":"10.48550/arXiv.2210.14461","DOIUrl":"https://doi.org/10.48550/arXiv.2210.14461","url":null,"abstract":"Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use\"pyramidal vision transformer\"(PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127772844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Augraphy: A Data Augmentation Library for Document Images Augraphy:一个文档图像的数据增强库

IEEE International Conference on Document Analysis and Recognition Pub Date : 2022-08-30 DOI: 10.48550/arXiv.2208.14558

Samay Maini, Alexander Groleau, Kok Wei Chee, Stefan Larson, Jonathan Boarman

引用次数: 6

Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks 基于变压器的表单字段提取器在表单攻击下的鲁棒性评估

IEEE International Conference on Document Analysis and Recognition Pub Date : 2021-10-08 DOI: 10.1007/978-3-031-41679-8_10

Le Xue, M. Gao, Zeyuan Chen, Caiming Xiong, Ran Xu

引用次数: 2

ICDAR 2021 Competition on Scene Video Text Spotting ICDAR 2021场景视频文本识别竞赛

IEEE International Conference on Document Analysis and Recognition Pub Date : 2021-07-26 DOI: 10.1007/978-3-030-86337-1_43

Zhanzhan Cheng, Jing Lu, Baorui Zou, Shuigeng Zhou, Fei Wu

引用次数: 2