Sebastian Wilde Alarcón Arenas, Yessenia Yari, Graciela Meza-Lovon
{"title":"一种基于形态算子和连通分量的文档布局分析方法","authors":"Sebastian Wilde Alarcón Arenas, Yessenia Yari, Graciela Meza-Lovon","doi":"10.1109/CLEI.2018.00080","DOIUrl":null,"url":null,"abstract":"During the last decades, the interest in preserving digitally historical documents have gained considerable attention. To exploit all the advantages and opportunities offered by the digitized documents, it's necessary to understand their contents. The first step toward that understanding is to determine the locations of the entities of the document, such as figures, titles, and captions, text, etc. This paper presents a new hybrid approach to analyze the structure of documents that is founded on morphological operators and connected components. The proposed method is divided into two stages, preprocessing, in which the quality of the document images is enhanced; and layout analysis, in which, we identify three types of layout. We also include a fragmentation process, in which we divide the page image into sections. Finally, We conducted the experiments on a dataset containing ancient historical newspapers.","PeriodicalId":379986,"journal":{"name":"2018 XLIV Latin American Computer Conference (CLEI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Document Layout Analysis Method Based on Morphological Operators and Connected Components\",\"authors\":\"Sebastian Wilde Alarcón Arenas, Yessenia Yari, Graciela Meza-Lovon\",\"doi\":\"10.1109/CLEI.2018.00080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the last decades, the interest in preserving digitally historical documents have gained considerable attention. To exploit all the advantages and opportunities offered by the digitized documents, it's necessary to understand their contents. The first step toward that understanding is to determine the locations of the entities of the document, such as figures, titles, and captions, text, etc. This paper presents a new hybrid approach to analyze the structure of documents that is founded on morphological operators and connected components. The proposed method is divided into two stages, preprocessing, in which the quality of the document images is enhanced; and layout analysis, in which, we identify three types of layout. We also include a fragmentation process, in which we divide the page image into sections. Finally, We conducted the experiments on a dataset containing ancient historical newspapers.\",\"PeriodicalId\":379986,\"journal\":{\"name\":\"2018 XLIV Latin American Computer Conference (CLEI)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 XLIV Latin American Computer Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI.2018.00080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 XLIV Latin American Computer Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI.2018.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Document Layout Analysis Method Based on Morphological Operators and Connected Components
During the last decades, the interest in preserving digitally historical documents have gained considerable attention. To exploit all the advantages and opportunities offered by the digitized documents, it's necessary to understand their contents. The first step toward that understanding is to determine the locations of the entities of the document, such as figures, titles, and captions, text, etc. This paper presents a new hybrid approach to analyze the structure of documents that is founded on morphological operators and connected components. The proposed method is divided into two stages, preprocessing, in which the quality of the document images is enhanced; and layout analysis, in which, we identify three types of layout. We also include a fragmentation process, in which we divide the page image into sections. Finally, We conducted the experiments on a dataset containing ancient historical newspapers.