基于图的页面分割的机器学习方法

2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) Pub Date : 2018-10-01 DOI:10.1109/SIBGRAPI.2018.00061

A. L. L. Maia, Frank D. Julca-Aguilar, N. Hirata

{"title":"基于图的页面分割的机器学习方法","authors":"A. L. L. Maia, Frank D. Julca-Aguilar, N. Hirata","doi":"10.1109/SIBGRAPI.2018.00061","DOIUrl":null,"url":null,"abstract":"We propose a new approach for segmenting a document image into its page components (e.g. text, graphics and tables). Our approach consists of two main steps. In the first step, a set of scores corresponding to the output of a convolutional neural network, one for each of the possible page component categories, is assigned to each connected component in the document. The labeled connected components define a fuzzy over-segmentation of the page. In the second step, spatially close connected components that are likely to belong to a same page component are grouped together. This is done by building an attributed region adjacency graph of the connected components and modeling the problem as an edge removal problem. Edges are then kept or removed based on a pre-trained classifier. The resulting groups, defined by the connected subgraphs, correspond to the detected page components. We evaluate our method on the ICDAR2009 dataset. Results show that our method effectively segments pages, being able to detect the nine types of page components. Furthermore, as our approach is based on simple machine learning models and graph-based techniques, it should be easily adapted to the segmentation of a variety of document types.","PeriodicalId":208985,"journal":{"name":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Machine Learning Approach for Graph-Based Page Segmentation\",\"authors\":\"A. L. L. Maia, Frank D. Julca-Aguilar, N. Hirata\",\"doi\":\"10.1109/SIBGRAPI.2018.00061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new approach for segmenting a document image into its page components (e.g. text, graphics and tables). Our approach consists of two main steps. In the first step, a set of scores corresponding to the output of a convolutional neural network, one for each of the possible page component categories, is assigned to each connected component in the document. The labeled connected components define a fuzzy over-segmentation of the page. In the second step, spatially close connected components that are likely to belong to a same page component are grouped together. This is done by building an attributed region adjacency graph of the connected components and modeling the problem as an edge removal problem. Edges are then kept or removed based on a pre-trained classifier. The resulting groups, defined by the connected subgraphs, correspond to the detected page components. We evaluate our method on the ICDAR2009 dataset. Results show that our method effectively segments pages, being able to detect the nine types of page components. Furthermore, as our approach is based on simple machine learning models and graph-based techniques, it should be easily adapted to the segmentation of a variety of document types.\",\"PeriodicalId\":208985,\"journal\":{\"name\":\"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIBGRAPI.2018.00061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBGRAPI.2018.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

我们提出了一种将文档图像分割成其页面组件(如文本、图形和表格)的新方法。我们的方法包括两个主要步骤。在第一步中，将一组与卷积神经网络的输出相对应的分数分配给文档中每个连接的组件，每个可能的页面组件类别对应一个分数。标记的连接组件定义了页面的模糊过分割。在第二步中，将可能属于同一页面组件的空间紧密连接的组件分组在一起。这是通过建立连接组件的属性区域邻接图并将问题建模为边缘去除问题来完成的。然后根据预训练的分类器保留或删除边缘。由连接的子图定义的结果组对应于检测到的页面组件。我们在ICDAR2009数据集上评估了我们的方法。结果表明，该方法能够有效地对页面进行分割，能够检测出9种类型的页面组件。此外，由于我们的方法是基于简单的机器学习模型和基于图的技术，它应该很容易适应各种文档类型的分割。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Machine Learning Approach for Graph-Based Page Segmentation

We propose a new approach for segmenting a document image into its page components (e.g. text, graphics and tables). Our approach consists of two main steps. In the first step, a set of scores corresponding to the output of a convolutional neural network, one for each of the possible page component categories, is assigned to each connected component in the document. The labeled connected components define a fuzzy over-segmentation of the page. In the second step, spatially close connected components that are likely to belong to a same page component are grouped together. This is done by building an attributed region adjacency graph of the connected components and modeling the problem as an edge removal problem. Edges are then kept or removed based on a pre-trained classifier. The resulting groups, defined by the connected subgraphs, correspond to the detected page components. We evaluate our method on the ICDAR2009 dataset. Results show that our method effectively segments pages, being able to detect the nine types of page components. Furthermore, as our approach is based on simple machine learning models and graph-based techniques, it should be easily adapted to the segmentation of a variety of document types.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)

自引率

0.00%

发文量