Guided Anchoring Cascade R-CNN: An intensive improvement of R-CNN in Vietnamese Document Detection

2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Pub Date : 2021-12-21 DOI:10.1109/NICS54270.2021.9701510

Hai Le, Truong-Hai Nguyen, Vy Le, Trong-Thuan Nguyen, Nguyen D. Vo, Khang Nguyen

{"title":"Guided Anchoring Cascade R-CNN: An intensive improvement of R-CNN in Vietnamese Document Detection","authors":"Hai Le, Truong-Hai Nguyen, Vy Le, Trong-Thuan Nguyen, Nguyen D. Vo, Khang Nguyen","doi":"10.1109/NICS54270.2021.9701510","DOIUrl":null,"url":null,"abstract":"Along with the development of the world, digital documents are gradually replacing paper documents. Therefore, the need to extract information from digital documents is increasing and becoming one of the main interests in the field of computer vision, particularly reading comprehension of image documents. The problem of object detection on image documents (figures, tables, formulas) is one of the premise problems for analyzing and extracting information from documents. Previous studies have mostly focused on English documents. In this study, we now experiment on a Vietnamese image document dataset UIT-DODV, which includes four classes: Table, Figure, Caption and Formula. We test on common state-of-the-art object detection models such as Double-Head R-CNN, Libra R-CNN, Guided Anchoring and achieved the highest results with Guided Anchoring of 73.6% mAP. Besides, we assume that high-quality anchor boxes are keys to the success of an anchor-based object detection models, thus we decide to adopt Guided Anchoring in our research. Moreover, we attempt to raise the quality of the predicted bounding boxes by utilizing Cascade R-CNN architecture, which can afford this by its scheme, so that we can filter out as many confused bounding boxes as possible. Based on the initial evaluation results from the common state-of-the-art object detection models, we proposed an object detection model for Vietnamese image documents based on Cascade R-CNN and Guided Anchoring. Our proposed model has achieved up to 76.6% mAP, 2.1% higher than the baseline model on the UIT-DODV dataset.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Along with the development of the world, digital documents are gradually replacing paper documents. Therefore, the need to extract information from digital documents is increasing and becoming one of the main interests in the field of computer vision, particularly reading comprehension of image documents. The problem of object detection on image documents (figures, tables, formulas) is one of the premise problems for analyzing and extracting information from documents. Previous studies have mostly focused on English documents. In this study, we now experiment on a Vietnamese image document dataset UIT-DODV, which includes four classes: Table, Figure, Caption and Formula. We test on common state-of-the-art object detection models such as Double-Head R-CNN, Libra R-CNN, Guided Anchoring and achieved the highest results with Guided Anchoring of 73.6% mAP. Besides, we assume that high-quality anchor boxes are keys to the success of an anchor-based object detection models, thus we decide to adopt Guided Anchoring in our research. Moreover, we attempt to raise the quality of the predicted bounding boxes by utilizing Cascade R-CNN architecture, which can afford this by its scheme, so that we can filter out as many confused bounding boxes as possible. Based on the initial evaluation results from the common state-of-the-art object detection models, we proposed an object detection model for Vietnamese image documents based on Cascade R-CNN and Guided Anchoring. Our proposed model has achieved up to 76.6% mAP, 2.1% higher than the baseline model on the UIT-DODV dataset.

查看原文本刊更多论文

引导锚定级联R-CNN:对R-CNN在越南语文档检测中的强化改进

随着世界的发展，数字文档正在逐渐取代纸质文档。因此，从数字文档中提取信息的需求日益增加，成为计算机视觉领域的主要研究方向之一，尤其是图像文档的阅读理解。图像文档(图形、表格、公式)的目标检测问题是分析和提取文档信息的前提问题之一。以往的研究主要集中在英文文献上。在本研究中，我们现在在越南图像文档数据集unit - dodv上进行实验，该数据集包括四个类:表、图、标题和公式。我们在常见的最先进的目标检测模型如Double-Head R-CNN, Libra R-CNN, Guided anchor上进行了测试，以73.6% mAP的Guided anchor获得了最高的结果。此外，我们认为高质量的锚盒是基于锚的目标检测模型成功的关键，因此我们决定在我们的研究中采用引导锚定。此外，我们尝试使用Cascade R-CNN架构来提高预测的边界框的质量，该架构的方案可以负担得起这一点，这样我们就可以过滤掉尽可能多的混淆边界框。基于常用的最先进的目标检测模型的初步评估结果，我们提出了一种基于级联R-CNN和导引锚定的越南图像文档目标检测模型。我们提出的模型在unit - dodv数据集上实现了76.6%的mAP，比基线模型高2.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 8th NAFOSTED Conference on Information and Computer Science (NICS)

自引率

0.00%

发文量