End-to-end semi-supervised approach with modulated object queries for table detection in documents

IF 2.5 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal on Document Analysis and Recognition Pub Date : 2024-07-10 DOI:10.1007/s10032-024-00471-0

Iqraa Ehsan, Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal

{"title":"End-to-end semi-supervised approach with modulated object queries for table detection in documents","authors":"Iqraa Ehsan, Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal","doi":"10.1007/s10032-024-00471-0","DOIUrl":null,"url":null,"abstract":"<p>Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images. Although deep learning has shown remarkable progress in this realm, it typically requires an extensive dataset of labeled data for proficient training. Current CNN-based semi-supervised table detection approaches use the anchor generation process and non-maximum suppression in their detection process, limiting training efficiency. Meanwhile, transformer-based semi-supervised techniques adopted a one-to-one match strategy that provides noisy pseudo-labels, limiting overall efficiency. This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy combining one-to-one and one-to-many assignment techniques. This approach significantly enhances training efficiency during the early stages, ensuring superior pseudo-labels for further training. Our semi-supervised approach is comprehensively evaluated on benchmark datasets, including PubLayNet, ICADR-19, and TableBank. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data, marking a 7.4 and 7.6 point improvement over previous semi-supervised table detection approach, respectively. The results clearly show the superiority of our semi-supervised approach, surpassing all existing state-of-the-art methods by substantial margins. This research represents a significant advancement in semi-supervised table detection methods, offering a more efficient and accurate solution for practical document analysis tasks.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"25 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Document Analysis and Recognition","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10032-024-00471-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images. Although deep learning has shown remarkable progress in this realm, it typically requires an extensive dataset of labeled data for proficient training. Current CNN-based semi-supervised table detection approaches use the anchor generation process and non-maximum suppression in their detection process, limiting training efficiency. Meanwhile, transformer-based semi-supervised techniques adopted a one-to-one match strategy that provides noisy pseudo-labels, limiting overall efficiency. This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy combining one-to-one and one-to-many assignment techniques. This approach significantly enhances training efficiency during the early stages, ensuring superior pseudo-labels for further training. Our semi-supervised approach is comprehensively evaluated on benchmark datasets, including PubLayNet, ICADR-19, and TableBank. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data, marking a 7.4 and 7.6 point improvement over previous semi-supervised table detection approach, respectively. The results clearly show the superiority of our semi-supervised approach, surpassing all existing state-of-the-art methods by substantial margins. This research represents a significant advancement in semi-supervised table detection methods, offering a more efficient and accurate solution for practical document analysis tasks.

Abstract Image

查看原文本刊更多论文

利用调制对象查询的端到端半监督方法检测文档中的表格

表格检测是文档分析中的一项重要任务，旨在精确识别和定位文档图像中的表格。虽然深度学习在这一领域取得了显著进展，但通常需要大量标注数据集才能进行熟练训练。目前基于 CNN 的半监督表格检测方法在检测过程中使用锚生成过程和非最大抑制，限制了训练效率。同时，基于变换器的半监督技术采用一对一匹配策略，提供了噪声伪标签，限制了整体效率。本研究提出了一种创新的基于变压器的半监督表检测器。它通过结合一对一和一对多分配技术的新型匹配策略，提高了伪标签的质量。这种方法大大提高了早期阶段的训练效率，确保为进一步训练提供优质的伪标签。我们的半监督方法在基准数据集上进行了全面评估，包括 PubLayNet、ICADR-19 和 TableBank。与之前的半监督表格检测方法相比，该方法分别提高了 7.4 和 7.6 个百分点，在含有 30% 标签数据的 TableBank（单词）和 PubLaynet 上的 mAP 分别为 95.7% 和 97.9%，达到了最新水平。这些结果清楚地表明了我们的半监督方法的优越性，大大超过了所有现有的先进方法。这项研究代表了半监督表格检测方法的重大进步，为实际文档分析任务提供了更高效、更准确的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal on Document Analysis and Recognition 工程技术-计算机：人工智能

CiteScore

6.20

自引率

4.30%

发文量

审稿时长

7.5 months

期刊介绍： The large number of existing documents and the production of a multitude of new ones every year raise important issues in efficient handling, retrieval and storage of these documents and the information which they contain. This has led to the emergence of new research domains dealing with the recognition by computers of the constituent elements of documents - including characters, symbols, text, lines, graphics, images, handwriting, signatures, etc. In addition, these new domains deal with automatic analyses of the overall physical and logical structures of documents, with the ultimate objective of a high-level understanding of their semantic content. We have also seen renewed interest in optical character recognition (OCR) and handwriting recognition during the last decade. Document analysis and recognition are obviously the next stage. Automatic, intelligent processing of documents is at the intersections of many fields of research, especially of computer vision, image analysis, pattern recognition and artificial intelligence, as well as studies on reading, handwriting and linguistics. Although quality document related publications continue to appear in journals dedicated to these domains, the community will benefit from having this journal as a focal point for archival literature dedicated to document analysis and recognition.