一种基于形态算子和连通分量的文档布局分析方法

Sebastian Wilde Alarcón Arenas, Yessenia Yari, Graciela Meza-Lovon
{"title":"一种基于形态算子和连通分量的文档布局分析方法","authors":"Sebastian Wilde Alarcón Arenas, Yessenia Yari, Graciela Meza-Lovon","doi":"10.1109/CLEI.2018.00080","DOIUrl":null,"url":null,"abstract":"During the last decades, the interest in preserving digitally historical documents have gained considerable attention. To exploit all the advantages and opportunities offered by the digitized documents, it's necessary to understand their contents. The first step toward that understanding is to determine the locations of the entities of the document, such as figures, titles, and captions, text, etc. This paper presents a new hybrid approach to analyze the structure of documents that is founded on morphological operators and connected components. The proposed method is divided into two stages, preprocessing, in which the quality of the document images is enhanced; and layout analysis, in which, we identify three types of layout. We also include a fragmentation process, in which we divide the page image into sections. Finally, We conducted the experiments on a dataset containing ancient historical newspapers.","PeriodicalId":379986,"journal":{"name":"2018 XLIV Latin American Computer Conference (CLEI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Document Layout Analysis Method Based on Morphological Operators and Connected Components\",\"authors\":\"Sebastian Wilde Alarcón Arenas, Yessenia Yari, Graciela Meza-Lovon\",\"doi\":\"10.1109/CLEI.2018.00080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the last decades, the interest in preserving digitally historical documents have gained considerable attention. To exploit all the advantages and opportunities offered by the digitized documents, it's necessary to understand their contents. The first step toward that understanding is to determine the locations of the entities of the document, such as figures, titles, and captions, text, etc. This paper presents a new hybrid approach to analyze the structure of documents that is founded on morphological operators and connected components. The proposed method is divided into two stages, preprocessing, in which the quality of the document images is enhanced; and layout analysis, in which, we identify three types of layout. We also include a fragmentation process, in which we divide the page image into sections. Finally, We conducted the experiments on a dataset containing ancient historical newspapers.\",\"PeriodicalId\":379986,\"journal\":{\"name\":\"2018 XLIV Latin American Computer Conference (CLEI)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 XLIV Latin American Computer Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI.2018.00080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 XLIV Latin American Computer Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI.2018.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在过去的几十年里,保存数字历史文献的兴趣已经获得了相当大的关注。要充分利用数字化文献带来的优势和机遇,就必须了解数字化文献的内容。实现这种理解的第一步是确定文档实体的位置,例如图形、标题、说明文字等。本文提出了一种基于形态算子和连通成分的混合方法来分析文档结构。该方法分为两个阶段:预处理阶段,增强文档图像的质量;在布局分析中,我们确定了三种布局类型。我们还包括一个碎片化过程,在这个过程中,我们将页面图像分成几个部分。最后,我们在一个包含古代历史报纸的数据集上进行了实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Document Layout Analysis Method Based on Morphological Operators and Connected Components
During the last decades, the interest in preserving digitally historical documents have gained considerable attention. To exploit all the advantages and opportunities offered by the digitized documents, it's necessary to understand their contents. The first step toward that understanding is to determine the locations of the entities of the document, such as figures, titles, and captions, text, etc. This paper presents a new hybrid approach to analyze the structure of documents that is founded on morphological operators and connected components. The proposed method is divided into two stages, preprocessing, in which the quality of the document images is enhanced; and layout analysis, in which, we identify three types of layout. We also include a fragmentation process, in which we divide the page image into sections. Finally, We conducted the experiments on a dataset containing ancient historical newspapers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信