taka at the FinSBD-3 task: Tables and Figures Extraction using Object Detection Techniques

Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442442.3451379

Tien-Dung Le

引用次数: 1

Abstract

FinSBD-3 is a shared task organized in the context of the 1st workshop on Financial Technology on the Web. The task focuses on extracting the entire structure of noisy PDF financial documents that include 1) sentences, lists, items, and organization of lists and items; 2) figures and tables; 3) headers and footers. This paper describes the approach that allows us to extract the figures and tables using their visual cues. We applied the object segmentation techniques in image processing to detect the location of figures and tables in the PDF files. A post-processing method is then executed in order to find exact content. The result shows the potential of this approach.

查看原文本刊更多论文

FinSBD-3任务:使用目标检测技术提取表格和图形

FinSBD-3是在第一届网络金融技术研讨会的背景下组织的一项共享任务。该任务的重点是提取噪声PDF财务文档的整个结构，包括1)句子、列表、项目以及列表和项目的组织;2)图表;3)页眉和页脚。本文描述的方法，使我们能够提取图形和表格使用他们的视觉线索。我们将目标分割技术应用到图像处理中来检测PDF文件中图形和表格的位置。然后执行后处理方法以找到确切的内容。结果显示了这种方法的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Proceedings of the Web Conference 2021

自引率

0.00%

发文量