IEEE International Conference on Document Analysis and Recognition最新文献_第3页

Topic Shift Detection in Chinese Dialogues: Corpus and Benchmark 汉语对话中的话题转移检测:语料库和基准

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-05-02 DOI: 10.48550/arXiv.2305.01195

Jian-Dong Lin, Yaxin Fan, Feng Jiang, Xiaomin Chu, Peifeng Li

引用次数: 2

Information Redundancy and Biases in Public Document Information Extraction Benchmarks 公共文档信息提取基准中的信息冗余和偏差

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-28 DOI: 10.48550/arXiv.2304.14936

Seif Laatiri, Pirashanth Ratnamogan, Joel Tang, Laurent Lam, William Vanhuffel, Fabien Caspani

{"title":"Information Redundancy and Biases in Public Document Information Extraction Benchmarks","authors":"Seif Laatiri, Pirashanth Ratnamogan, Joel Tang, Laurent Lam, William Vanhuffel, Fabien Caspani","doi":"10.48550/arXiv.2304.14936","DOIUrl":"https://doi.org/10.48550/arXiv.2304.14936","url":null,"abstract":"Advances in the Visually-rich Document Understanding (VrDU) field and particularly the Key-Information Extraction (KIE) task are marked with the emergence of efficient Transformer-based approaches such as the LayoutLM models. Despite the good performance of KIE models when fine-tuned on public benchmarks, they still struggle to generalize on complex real-life use-cases lacking sufficient document annotations. Our research highlighted that KIE standard benchmarks such as SROIE and FUNSD contain significant similarity between training and testing documents and can be adjusted to better evaluate the generalization of models. In this work, we designed experiments to quantify the information redundancy in public benchmarks, revealing a 75% template replication in SROIE official test set and 16% in FUNSD. We also proposed resampling strategies to provide benchmarks more representative of the generalization ability of models. We showed that models not suited for document analysis struggle on the adjusted splits dropping on average 10,5% F1 score on SROIE and 3.5% on FUNSD compared to multi-modal models dropping only 7,5% F1 on SROIE and 0.5% F1 on FUNSD.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127154234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data CCpdf:从网络抓取数据为视觉丰富的文档构建高质量的语料库

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-28 DOI: 10.48550/arXiv.2304.14953

M. Turski, Tomasz Stanislawek, Karol Kaczmarek, Pawel Dyda, Filip Grali'nski

引用次数: 1

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers 视觉变形:将卷积合并到视觉变形层中

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-27 DOI: 10.48550/arXiv.2304.13991

Brian Kenji Iwana, Akihiro Kusuda

引用次数: 0

Contour Completion by Transformers and Its Application to Vector Font Data 变压器轮廓补全及其在矢量字体数据中的应用

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-27 DOI: 10.48550/arXiv.2304.13988

Yusuke Nagata, Brian Kenji Iwana, S. Uchida

引用次数: 0

Key-value information extraction from full handwritten pages 从完整的手写页面提取键值信息

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-26 DOI: 10.48550/arXiv.2304.13530

Solène Tarride, Mélodie Boillet, Christopher Kermorvant

引用次数: 1

Structure Diagram Recognition in Financial Announcements 财务公告中的结构图识别

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-26 DOI: 10.48550/arXiv.2304.13240

Meixuan Qiao, Jun Wang, Junfu Xiang, Qiyu Hou, Ruixuan Li

引用次数: 0

Information Extraction from Documents: Question Answering vs Token Classification in real-world setups 从文档中提取信息:问题回答与令牌分类在现实世界的设置

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-21 DOI: 10.48550/arXiv.2304.10994

Laurent Lam, Pirashanth Ratnamogan, Joel Tang, William Vanhuffel, Fabien Caspani

{"title":"Information Extraction from Documents: Question Answering vs Token Classification in real-world setups","authors":"Laurent Lam, Pirashanth Ratnamogan, Joel Tang, William Vanhuffel, Fabien Caspani","doi":"10.48550/arXiv.2304.10994","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10994","url":null,"abstract":"Research in Document Intelligence and especially in Document Key Information Extraction (DocKIE) has been mainly solved as Token Classification problem. Recent breakthroughs in both natural language processing (NLP) and computer vision helped building document-focused pre-training methods, leveraging a multimodal understanding of the document text, layout and image modalities. However, these breakthroughs also led to the emergence of a new DocKIE subtask of extractive document Question Answering (DocQA), as part of the Machine Reading Comprehension (MRC) research field. In this work, we compare the Question Answering approach with the classical token classification approach for document key information extraction. We designed experiments to benchmark five different experimental setups : raw performances, robustness to noisy environment, capacity to extract long entities, fine-tuning speed on Few-Shot Learning and finally Zero-Shot Learning. Our research showed that when dealing with clean and relatively short entities, it is still best to use token classification-based approach, while the QA approach could be a good alternative for noisy environment or long entities use-cases.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134176835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Study on Reproducibility and Replicability of Table Structure Recognition Methods 表结构识别方法的再现性和可复制性研究

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-04-20 DOI: 10.48550/arXiv.2304.10439

Kehinde E. Ajayi, Muntabhir Hasan Choudhury, S. Rajtmajer, Jian Wu

{"title":"A Study on Reproducibility and Replicability of Table Structure Recognition Methods","authors":"Kehinde E. Ajayi, Muntabhir Hasan Choudhury, S. Rajtmajer, Jian Wu","doi":"10.48550/arXiv.2304.10439","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10439","url":null,"abstract":"Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116866786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models WordStylist:基于潜在扩散模型的逐字手写文本生成

IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-03-29 DOI: 10.48550/arXiv.2303.16576

Konstantina Nikolaidou, George Retsinas, V. Christlein, Mathias Seuret, Giorgos Sfikas, Elisa Barney Smith, Hamam Mokayed, M. Liwicki

{"title":"WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models","authors":"Konstantina Nikolaidou, George Retsinas, V. Christlein, Mathias Seuret, Giorgos Sfikas, Elisa Barney Smith, Hamam Mokayed, M. Liwicki","doi":"10.48550/arXiv.2303.16576","DOIUrl":"https://doi.org/10.48550/arXiv.2303.16576","url":null,"abstract":"Text-to-Image synthesis is the task of generating an image according to a specific text description. Generative Adversarial Networks have been considered the standard method for image synthesis virtually since their introduction. Denoising Diffusion Probabilistic Models are recently setting a new baseline, with remarkable results in Text-to-Image synthesis, among other fields. Aside its usefulness per se, it can also be particularly relevant as a tool for data augmentation to aid training models for other document image processing tasks. In this work, we present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles, by using class index styles and text content prompts without the need of adversarial training, writer recognition, or text recognition. We gauge system performance with the Fr'echet Inception Distance, writer recognition accuracy, and writer retrieval. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data. Code is available at: https://github.com/koninik/WordStylist.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115508662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1