Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI:10.1109/ICDAR.2017.50

Dafang He, Scott D. Cohen, Brian L. Price, Daniel Kifer, C. Lee Giles

{"title":"Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection","authors":"Dafang He, Scott D. Cohen, Brian L. Price, Daniel Kifer, C. Lee Giles","doi":"10.1109/ICDAR.2017.50","DOIUrl":null,"url":null,"abstract":"Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level \"edges\" around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"89","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 89

Abstract

Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level "edges" around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.

查看原文本刊更多论文

语义页面分割和表检测的多尺度多任务FCN

页面分割和表检测在理解文档结构中起着重要的作用。我们提出了一种页面分割算法，该算法结合了最先进的深度学习方法，用于分割三种类型的文档元素:文本块、表格和图形。提出了一种多尺度、多任务的全卷积神经网络(FCN)，用于语义页面分割和元素轮廓检测。语义分割网络准确地预测了三个元素类在每个像素上的概率。轮廓检测网络准确地预测每个元素出现的实例级“边缘”。我们提出了一种条件随机场(CRF)，它使用语义分割和轮廓网络的特征输出来改进语义分割网络的输出。给定语义分割输出，我们还使用一些启发式规则和验证网络从页面中提取单个表实例以消除误报。我们表明，尽管我们只考虑页面图像作为输入，但我们产生的结果与依赖PDF文件信息和启发式以及针对特定类型文档定制的手工功能的其他方法相当。我们的方法从真实和合成的训练数据中学习页面分割的代表性特征。%，并在实际文档上产生良好的结果。基于学习的属性使其在文档类型和元素外观方面比现有方法更通用。例如，我们的方法可靠地检测稀疏行表，这对于基于规则或启发式方法来说是很难的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量