A Graphical Approach to Document Layout Analysis

Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, M. Sokolov, Vadym Barda, Delphine Vendryes, Christy Tanner
{"title":"A Graphical Approach to Document Layout Analysis","authors":"Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, M. Sokolov, Vadym Barda, Delphine Vendryes, Christy Tanner","doi":"10.48550/arXiv.2308.02051","DOIUrl":null,"url":null,"abstract":"Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2308.02051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.
文档布局分析的图形化方法
文档布局分析(DLA)的任务是检测文档中不同的语义内容,并将这些内容正确地分类到适当的类别中(例如,文本、标题、图形)。DLA管道使用户能够将文档转换为结构化的机器可读格式,然后可用于许多有用的下游任务。大多数现有的最先进(SOTA) DLA模型将文档表示为图像,丢弃了电子生成的pdf中可用的丰富元数据。直接利用这些元数据,我们将每个PDF页面表示为一个结构化的图,并将DLA问题框架为一个图分割和分类问题。我们介绍了基于图的布局分析模型(GLAM),这是一种轻量级的图神经网络,在两个具有挑战性的DLA数据集上与SOTA模型竞争,同时比现有模型小一个数量级。特别是,400万参数的GLAM模型在DocLayNet数据集的11个类中的5个上优于领先的基于140万+参数的计算机视觉模型。这两个模型的简单集成实现了DocLayNet上的新技术,将mAP从76.8提高到80.8。总体而言,GLAM比SOTA模型效率高5倍以上,使GLAM成为pla任务的有利工程选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信