Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches

Sven Najem-Meyer, Matteo Romanello
{"title":"Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches","authors":"Sven Najem-Meyer, Matteo Romanello","doi":"10.48550/arXiv.2212.13924","DOIUrl":null,"url":null,"abstract":"Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.","PeriodicalId":191971,"journal":{"name":"Workshop on Computational Humanities Research","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Computational Humanities Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.13924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.
大量文本历史文献的页面布局分析:文本与视觉方法的比较
页面布局分析是文档处理中的一个基本步骤,它可以将页面分割成感兴趣的区域。学术评论具有高度复杂的布局和混合的脚本,是文本繁重的文档,对于最先进的模型仍然具有挑战性。它们的布局在不同的版本中有很大的不同,它们最重要的区域主要是由语义而不是图形特征(如位置或外观)来定义的。这种设置要求在文本、视觉和混合方法之间进行比较。因此,我们评估了两个变压器(LayoutLMv3和RoBERTa)和一个目标检测网络(YOLOv5)的性能。如果结果显示支持后者的明显优势,我们也列出了一些对这一发现的警告。除了我们的实验之外,我们还发布了一个数据集,其中包括从19世纪的评论中抽取的大约300个带注释的页面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信